RAG solution on Amazon Bedrock - Part 2: Build the MCQ orchestrator using Bedrock Converse API
Generative AI Demo using the Well-Architected Machine Learning Lens PDF to prepare for Machine Learning Engineer Associate (MLA-C01) Certification
In the first part of this series, we set up Amazon OpenSearch Serverless using AWS CDK to build a vector store for our Retrieval-Augmented Generation (RAG) solution. In this second part, we will build the Multiple Choice Questions (MCQ) orchestrator using Amazon Bedrock’s Converse API. This orchestrator will generate and evaluate MCQs based on the Well-Architected Machine Learning Lens PDF to help candidates prepare for the AWS Machine Learning Engineer Associate (MLA-C01) certification.

Pre-requisites
- AWS Account
- AWS CLI
- NodeJS
- Python
- AWS CDK
- Bootstrap CDK with:
cdk bootstrap aws://ACCOUNT-NUMBER/REGION
- Visual Studio Code (or your favourite Editor)
- Amazon Bedrock Access
CDK Project Implementation
Ensure you have the AWS CDK installed and initialized as per the pre-requisites.
The complete source code for this project can be found on our GitHub repository here. Clone the repository using the following command with release branch v0.2.1:
git clone -b v0.2.1 https://github.com/awsdataarchitect/opensearch-bedrock-rag-cdk.git && cd opensearch-bedrock-rag-cdk
npm i
Ensure you have deployed the AOSS cluster from CDK project and the AWS Machine Learning Lens PDF document is indexed as mentioned in the first part of this series.
Explanation of Key Components in this release:
app.py
: This Streamlit application takes user input, sends it to theanswer_query
function, and displays the generated MCQs.query_against_openSearch.py
: This script communicates with Amazon Bedrock to generate embeddings, performs a KNN search on OpenSearch, and uses the Bedrock Converse API to generate and format the MCQs.
Key Points
- Embedding Generation:
get_embedding
function uses Amazon Bedrock to generate embeddings for the input text. - KNN Search: Performs a KNN search on the OpenSearch vector index to find similar documents.
- Prompt Engineering: Constructs a prompt for generating MCQs based on the retrieved documents.
- Conversation Orchestrator: Uses Bedrock Converse API to manage the conversation context and generate responses.
Follow the steps below to use the application
Step 1: Set Up the Python Virtual Environment
First, set up a Python virtual environment in the root directory of this Proof of Concept (POC). Ensure you are using Python 3.10. Run the following commands:
pip install virtualenv
python3.10 -m venv venv
A virtual environment is essential for managing dependencies and ensuring consistency across different development environments. If you need more clarification on setting up a virtual environment, refer to relevant resources or documentation.
After creating the virtual environment, activate it with the following command:
source venv/bin/activate
Step 2: Install Required Packages
Once your virtual environment is active, install the required packages listed in the requirements.txt
file. Run this command in the root directory of the POC:
pip install -r requirements.txt
Step 3: Configure Environment Variables
Next, configure your environment variables. Create a .env
file in the root directory of the repository and add the following line:
profile_name=<CLI_profile_name>
Ensure your AWS CLI profile has access to Amazon Bedrock.
Step 4: Running the Application
After cloning the repository, creating and activating the virtual environment, installing the required packages, and setting up the .env
file, your application is ready to go. Start the application with the following command:
streamlit run app.py
Once the application is up and running in your browser, you can begin asking questions and generating natural language responses. The Amazon Bedrock Converse API will manage the conversation history and context.
Once the official study guide for AWS Machine Learning Engineer Associate (MLA-C01) certification is released on August 13, 2024, you can use the topic name from the guide to generate the MCQ questions using our App. For now, I simply used a topic “Perform offline and online model evaluation (A/B testing)” from the MLS-C01 (AWS Machine Learning Specialty Certification) exam study guide as shown in screenshot below:

Here are all the questions generated by Bedrock Converse API for the sample Topic and using our RAG solution:
Question #1) Which of the following is NOT a reason to evaluate a model?
A) To determine if the model is overfitting or underfitting
B) To determine if the model is more sensitive or specific
C) To determine if the model is more accurate than a previous model
D) To determine if the model is more accurate than a random guess
Correct Answer:
D)
Explanation:
The model should be evaluated to determine if it is more accurate than a previous model, not a random guess.
Question #2) Which of the following is NOT a method for evaluating a model?
A) A/B testing
B) Cross-validation
C) Holdout validation
D) Random sampling
Correct Answer:
D)
Explanation:
Random sampling is not a method for evaluating a model, but rather a method for selecting a subset of data for training or testing.
Question #3) Which of the following is NOT a type of error rate that can be calculated for a multiclass model?
A) False positive rate
B) False negative rate
C) True positive rate
D) True negative rate
Correct Answer:
D)
Explanation:
In a multiclass model, there is no concept of a true negative rate, as there is no single "negative" class.
Question #4) Which of the following is NOT a type of evaluation that can be performed on a model?
A) Offline evaluation
B) Online evaluation
C) Sensitivity analysis
D) Specificity analysis
Correct Answer:
C)
Explanation:
Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.
Question #5) Which of the following is NOT a benefit of A/B testing?
A) It allows for direct comparison of different models
B) It allows for testing of models with live data
C) It allows for testing of models with historical data
D) It allows for testing of models with synthetic data
Correct Answer:
D)
Explanation:
A/B testing is typically performed with live or historical data, not synthetic data.
Question #6) Which of the following is NOT a step in the A/B testing process?
A) Split the data into two groups
B) Train two models on the split data
C) Test the models on the split data
D) Compare the performance of the two models
E) Replace the old model with the new model if it performs better
Correct Answer:
D)
Explanation:
The step of comparing the performance of the two models is not explicitly mentioned in the A/B testing process.
Question #7) Which of the following is NOT a reason to perform offline evaluation?
A) To evaluate the model's performance on a subset of the data
B) To evaluate the model's performance on historical data
C) To evaluate the model's performance on live data
D) To evaluate the model's performance on synthetic data
Correct Answer:
C)
Explanation:
Offline evaluation is typically performed on historical or synthetic data, not live data.
Question #8) Which of the following is NOT a benefit of online evaluation?
A) It allows for testing of models with live data
B) It allows for direct comparison of different models
C) It allows for evaluation of the model's performance over time
D) It allows for evaluation of the model's performance on a subset of the data
Correct Answer:
D)
Explanation:
Online evaluation typically involves testing the model on all available data, not a subset of the data.
Question #9) Which of the following is NOT a type of evaluation metric that can be used to evaluate a model?
A) Precision
B) Recall
C) F1 score
D) Mean squared error
Correct Answer:
D)
Explanation:
Mean squared error is a type of loss function, not an evaluation metric.
Question #10) Which of the following is NOT a method for evaluating the performance of a multiclass model?
A) Confusion matrix
B) ROC curve
C) Precision-recall curve
D) Accuracy
Correct Answer:
D)
Explanation:
Accuracy is not a method for evaluating the performance of a multiclass model, but rather a metric that can be calculated from the results of other evaluation methods.
Question #11) Which of the following is NOT a type of evaluation that can be performed on a model?
A) Sensitivity analysis
B) Specificity analysis
C) Precision analysis
D) Recall analysis
Correct Answer:
A)
Explanation:
Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.
Question #12) Which of the following is NOT a benefit of offline evaluation?
A) It allows for evaluation of the model's performance on a subset of the data
B) It allows for evaluation of the model's performance on historical data
C) It allows for testing of models with synthetic data
D) It allows for evaluation of the model's performance over time
Correct Answer:
D)
Explanation:
Offline evaluation typically involves testing the model on a fixed dataset, not over time.
Question #13) Which of the following is NOT a benefit of A/B testing?
A) It allows for direct comparison of different models
B) It allows for testing of models with live data
C) It allows for evaluation of the model's performance over time
D) It allows for testing of models with historical data
Correct Answer:
C)
Explanation:
A/B testing typically involves testing the model on a fixed dataset, not over time.
Question #14) Which of the following is NOT a type of evaluation that can be performed on a model?
A) Sensitivity analysis
B) Specificity analysis
C) Precision analysis
D) Recall analysis
Correct Answer:
A)
Explanation:
Sensitivity analysis is a method for understanding how changes in input variables affect the output of a model, not a type of evaluation.
Logging response metadata and token counts
The Converse method also returns metadata about the API call. We log the usage
property that includes details about the input and output tokens. This can help you understand the charges for your API call. The latency
property gives us the latency of the call to Converse
, in milliseconds.
You will see a response similar to the following in the Streamlit’s background screen:
usage: {'inputTokens': 579, 'outputTokens': 1600, 'totalTokens': 2179}
latencyMs: {'latencyMs': 34931}
Note that the displayed usage numbers are only for the last API call from our app. You can use these token counts to track the cost of the API call. You can read more about token-based pricing on the Amazon Bedrock pricing page.
Clean-up
To delete all the resources, simply run the cdk destroy
command. By running this command, you ensure the complete removal of the defined resources, freeing up any allocated resources and eliminating associated costs.
Conclusion
In this part of the series, we created a Streamlit application that leverages Amazon Bedrock’s Converse API to generate and evaluate multiple-choice questions. This MCQ orchestrator will help candidates prepare for the AWS Machine Learning Engineer Associate (MLA-C01) certification by providing targeted practice based on the Well-Architected Machine Learning Lens PDF. Stay tuned for the next part of this series, where we will further enhance our RAG solution.
In Plain English 🚀
Thank you for being a part of the In Plain English community! Before you go:
- Be sure to clap and follow the writer ️👏️️
- Follow us: X | LinkedIn | YouTube | Discord | Newsletter
- Visit our other platforms: CoFeed | Differ
- More content at PlainEnglish.io