Prompt Engineering with Claude 3 Haiku

10 days after Anthropic’s Claude 3 Sonnet landed in Amazon Bedrock, Claude 3 Haiku Model is now available on Amazon Bedrock as well. Haiku is one of the most affordable and fastest options on the market for its intelligence category. As you can see in the following table, Haiku’s pricing is very competitive in the Claude family. But how about the quality? That’s something I am going to find out in this blog.

I still used Image-Reader to run the test. The expected result is to output a json string with Claude model price in US east region.

In the first test, I ran the same query “Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else“` ” with Sonnet and Haiku respectively (Note: “` is to put words into model’s mouth to get the code block). As you can see from the following result, Sonnet gave exactly what I need, on the other hand Haiku’s invocation speed was very impressive (for both first byte latency and invocation latency ), but it missed the important information (input vs output token price).

Model	Response	Input Tokens	Output Tokens	Invocation Latency (ms)	First Byte Lantency (ms)
claude-3-sonnet-28k-20240229	{ “Claude Instant”: { “input_price_per_1000_tokens”: 0.00080, “output_price_per_1000_tokens”: 0.00240 }, “Claude 2.0/2.1”: { “input_price_per_1000_tokens”: 0.00800, “output_price_per_1000_tokens”: 0.02400 }, “Claude 3 Sonnet”: { “input_price_per_1000_tokens”: 0.00300, “output_price_per_1000_tokens”: 0.01500 }, “Claude 3 Haiku”: { “input_price_per_1000_tokens”: 0.00025, “output_price_per_1000_tokens”: 0.00125 } }	1641	225	7355	1701
claude-3-haiku-48k-20240307	{ “Claude Instant”: 0.00080, “Claude 2.0/2.1”: 0.00800, “Claude 3 Sonnet”: 0.00300, “Claude 3 Haiku”: 0.00025 }	1687	71	1531	875

In the second test, I tried different prompt “Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else. Please think step by step. “`” to see CoT (Chain of Thought) can help.

The result looks much better in terms of both response quality and invocation speed. As the thinking process is in the response, it’s hard to extract the JSON as the output format is unpredictable.

Model	Response	Input Tokens	Output Tokens	Invocation Latency (ms)	First Byte Lantency (ms)
claude-3-haiku-48k-20240307	To create the JSON for the Claude model price in the US East region, I will follow these steps: Identify the relevant information from the table: 1. The region is “US East (N. Virginia) and US West (Oregon)” The Claude model prices are listed under the “Price per 1,000 input tokens” and “Price per 1,000 output tokens” columns 2. Extract the prices for the Claude models in the US East region: Claude Instant: $0.00080 per 1,000 input tokens, $0.00240 per 1,000 output tokens Claude 2.0/2.1: $0.00800 per 1,000 input tokens, $0.02400 per 1,000 output tokens Claude 3 Sonnet: $0.00300 per 1,000 input tokens, $0.01500 per 1,000 output tokens Claude 3 Haiku: $0.00025 per 1,000 input tokens, $0.00125 per 1,000 output tokens 3. Format the information into a JSON object: { “Claude Instant”: { “price_per_1000_input_tokens”: 0.00080, “price_per_1000_output_tokens”: 0.00240 }, “Claude 2.0/2.1”: { “price_per_1000_input_tokens”: 0.00800, “price_per_1000_output_tokens”: 0.02400 }, “Claude 3 Sonnet”: { “price_per_1000_input_tokens”: 0.00300, “price_per_1000_output_tokens”: 0.01500 }, “Claude 3 Haiku”: { “price_per_1000_input_tokens”: 0.00025, “price_per_1000_output_tokens”: 0.00125 } }	1694	494	4220	886

In third test, I added a few extra guidance to previous prompt.

Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else. Please think step by step, and keep you thinking between the following tag:
<thinking>
{thinking}
</thinking>
Keep the answer in the following tag:
<answer>
{answer}
</answer>

The response still looks good and more organised, now I can just need to extract the JSON between the <answer></answer> tags, and pass it to downstream if needed. The invocation latency is still low.

Model	Response	Input Tokens	Output Tokens	Invocation Latency (ms)	First Byte Lantency (ms)
claude-3-haiku-48k-20240307	<think>To create the JSON to indicate the Claude model price in US East, I need to look for the Claude model in the “Anthropic models” column and then find the corresponding price per 1,000 input tokens and price per 1,000 output tokens for the “US East (N. Virginia) and US West (Oregon)” region. The Claude models listed are: Claude Instant Claude 2.0/2.1 Claude 3 Sonnet Claude 3 Haiku The price per 1,000 input tokens and price per 1,000 output tokens for the “US East (N. Virginia) and US West (Oregon)” region are listed for each of these models.</think> <answer>{ “Claude Instant”: { “price_per_1000_input_tokens”: 0.00080, “price_per_1000_output_tokens”: 0.00240 }, “Claude 2.0/2.1”: { “price_per_1000_input_tokens”: 0.00800, “price_per_1000_output_tokens”: 0.02400 }, “Claude 3 Sonnet”: { “price_per_1000_input_tokens”: 0.00300, “price_per_1000_output_tokens”: 0.01500 }, “Claude 3 Haiku”: { “price_per_1000_input_tokens”: 0.00025, “price_per_1000_output_tokens”: 0.00125 } }</answer>	1737	392	3830	875

So the conclusion of the test are:

Sonnet is smarter but slower than Haiku(1701 ms first byte latency, 7355 ms invocation latency).
Haiku offers very low first byte latency (875-886 ms in all 3 tests)
Haiku provides a reasonable good response (with prompting engineering) and fast invocation latency (1531 – 4220 ms in all 3 tests)

	Unleashing the Power… on Image-Reader: A project to exp…
	Bob on Build docker image with kaniko…
	Voces De La Tierra on Puppet for Windows: Remote…
	Use Amazon Q with Co… on Use Amazon CodeWhisperer for…
	Zigya on Mail for Exchange on E72
	Masking PII Image wi… on Mask Words in Image
	Use ChatGPT to check… on Why you need CodeGuru?
	AWS Config Advanced… on AWS Config Advance Queries aga…
	Gene on Fail to quiesce a virtual mach…
	Elisa Caldwell on TSM

Prompt Engineering with Claude 3 Haiku

Published by Jackie Chen

Leave a comment Cancel reply

Share this:

Related

Published by Jackie Chen

Leave a comment Cancel reply