How to automate image analysis with the ChatGPT vision API and Grafana Cloud Metrics
OpenAI’s ChatGPT has an extraordinary ability to process natural language, reason about a user’s prompts, and generate human-like conversation in response. However, as the saying goes, “a picture is worth a thousand words” — and perhaps an even more significant achievement is ChatGPT’s ability to understand and answer questions about images.
In this post, we’ll walk through an example of how to use ChatGPT’s vision capabilities — officially called GPT-4 with vision (or GPT-4V) — to identify objects in images and then automatically plot the results as metrics in Grafana Cloud. In our example, we will use publicly available images from the United States National Parks Service, but in the end, you’ll be able to leverage computer vision techniques and Grafana Cloud Metrics for your own unique use cases.
Example: count the number of vehicles entering Yellowstone
The following example illustrates how to automate the process of image analysis — a previously time-intensive and manual task performed by humans. Basically, we’ll have an AI-agent-turned-intelligence-analyst at our fingertips.
Let’s get started.
Task
We want to count the number of vehicles waiting to enter Yellowstone National Park from the North Gate, often referred to as the Roosevelt Arch. We will then save the output in a metric time series that we can view in a graph using Grafana Cloud.
Prerequisites
- Beginner programming skills in Ruby
- Free Grafana Cloud account
Inputs provided
Example input images for your AI agent to match
An image to inspect that refreshes over time
Process
Step 1: Initialize your OpenAI API client.
@openai_client = OpenAI::Client.new(
access_token: ENV["OPENAI_API_KEY"],
organization_id: ENV["OPENAI_API_ORGANIZATION_ID"]
)
Step 2: Prepare your ChatGPT vision prompts. These prompts tell ChatGPT what to do with the images provided.
Note that we provide two images of example vehicles that we’d like ChatGPT to identify. Our third image is from a security camera at the Yellowstone Roosevelt Arch entrance. The third image will update every few minutes, as new images of the entrance become available. Finally, we specify to OpenAI an example output contained in a machine-readable JSON object with the key/value result "matches"
.
system_context = "You are an expert image analyst capable of identifying patterns between images. You count a match when you find an object in the third image that looks like a car or truck from the first or second images. Only count a match if you're very confident a match exists."
user_messages = [
{ "type": "text", "text": "How many times does the object from the first or second image appear in the third image? Be precise."},
{ "type": "image_url",
"image_url": {
"url": "https://images.unsplash.com/photo-1616549972169-0a0d961c9905",
},
},
{ "type": "image_url",
"image_url": {
"url": "https://images.unsplash.com/photo-1544601640-b256c49a192d",
},
},
{ "type": "image_url",
"image_url": {
"url": "https://www.nps.gov/webcams-yell/mammoth_arch.jpg",
},
}
]
example_output = '
Example response object:
{
"matches": integer,
}
'
Step 3: Call the ChatGPT vision API: gpt-4-vision-preview
.
begin
response = @openai_client.chat(
parameters: {
model: "gpt-4-vision-preview",
messages: [
{ role: "system", content: system_context },
{ role: "system", content: example_output },
{ role: "user", content: user_messages }
],
temperature: 0.4,
max_tokens: 100
})
rescue => err
logger.fatal(err)
return
else
logger.info("OpenAI API response received and successfully processed")
logger.info("Response:\n#{response}")
end
Step 4: Save the response from OpenAI, which should be a JSON object with one key/value pair, e.g. { “matches”: integer
}. Make sure to view the example prompt input images to verify the accuracy of ChatGPT’s results.
hash_results = {}
hash_results = JSON.parse(response.dig("choices", 0, "message", "content"))
Step 5: Push the matches result from OpenAI’s inspection of the sample image to Grafana Cloud Metrics.
We use the Influx Line Protocol format below to write one metric at a time that is translated by Grafana Cloud’s backend to a Prometheus metric. Notice we provide a metric name and label convention so that you can expand from this single example to track metrics from more than one entrance in a single graph. You can find your endpoint URL, as well as the required metrics write credentials, in your Grafana Cloud portal.
#
# Save metric in the Influx Line Protocol format
#
metrics_payload = "nps_entrance,park=yellowstone vehicles=#{hash_results['matches']}"
#
# Push metric to Grafana Cloud using the Influx Line Protocol
#
begin
uri = URI.parse(ENV['GRAFANA_CLOUD_METRICS_INFLUX_PROXY_ENDPOINT'])
response = Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |client|
request = Net::HTTP::Post.new(uri.path)
request.body = metrics_payload
request["Authorization"] = "Basic #{@grafana_base64_encoded_auth_token}"
request["Content-Type"] = "text/plain"
client.request(request)
end
rescue => err
logger.fatal(err)
return
else
logger.info 'Grafana Cloud response:'
logger.info response.code
logger.info ''
return
end
Step 6: Visit your Grafana Cloud instance to graph your metric(s) over time. We suggest starting by using the Explore page and then selecting the example metric from the dropdown list.
To record the number of vehicle matches over an extended period of time, we suggest setting up this program to execute every 3-5 minutes as the camera feed refreshes.
That’s it! For a code-complete version of this example using ChatGPT vision capabilities and Grafana Cloud Metrics, please visit the following GitHub Gist. If you want to go one step further, and monitor the costs and resource usage of your OpenAI scripts, check out our OpenAI Integration.
If you have questions or get stuck, feel free to ask for help in our Community Forums or reach out to our Support team and we’ll be glad to help.
Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!