Prompt Engineering with Claude 3 Haiku


10 days after Anthropic’s Claude 3 Sonnet landed in Amazon Bedrock, Claude 3 Haiku Model is now available on Amazon Bedrock as well. Haiku is one of the most affordable and fastest options on the market for its intelligence category. As you can see in the following table, Haiku’s pricing is very competitive in the Claude family. But how about the quality? That’s something I am going to find out in this blog.

I still used Image-Reader to run the test. The expected result is to output a json string with Claude model price in US east region.

In the first test, I ran the same query “Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else“` ” with Sonnet and Haiku respectively (Note: “` is to put words into model’s mouth to get the code block). As you can see from the following result, Sonnet gave exactly what I need, on the other hand Haiku’s invocation speed was very impressive (for both first byte latency and invocation latency ), but it missed the important information (input vs output token price).

ModelResponseInput TokensOutput TokensInvocation Latency (ms)First Byte Lantency (ms)
claude-3-sonnet-28k-20240229{
“Claude Instant”: {
“input_price_per_1000_tokens”: 0.00080,
“output_price_per_1000_tokens”: 0.00240
},
“Claude 2.0/2.1”: {
“input_price_per_1000_tokens”: 0.00800,
“output_price_per_1000_tokens”: 0.02400
},
“Claude 3 Sonnet”: {
“input_price_per_1000_tokens”: 0.00300,
“output_price_per_1000_tokens”: 0.01500
},
“Claude 3 Haiku”: {
“input_price_per_1000_tokens”: 0.00025,
“output_price_per_1000_tokens”: 0.00125
}
}
164122573551701
claude-3-haiku-48k-20240307{ “Claude Instant”: 0.00080, “Claude 2.0/2.1”: 0.00800, “Claude 3 Sonnet”: 0.00300, “Claude 3 Haiku”: 0.00025 }1687711531875

In the second test, I tried different prompt “Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else. Please think step by step. “`” to see CoT (Chain of Thought) can help.

The result looks much better in terms of both response quality and invocation speed. As the thinking process is in the response, it’s hard to extract the JSON as the output format is unpredictable.

ModelResponseInput TokensOutput TokensInvocation Latency (ms)First Byte Lantency (ms)
claude-3-haiku-48k-20240307To create the JSON for the Claude model price in the US East region, I will follow these steps:
Identify the relevant information from the table:
1. The region is “US East (N. Virginia) and US West (Oregon)”
The Claude model prices are listed under the “Price per 1,000 input tokens” and “Price per 1,000 output tokens” columns
2. Extract the prices for the Claude models in the US East region:
Claude Instant: $0.00080 per 1,000 input tokens, $0.00240 per 1,000 output tokens
Claude 2.0/2.1: $0.00800 per 1,000 input tokens, $0.02400 per 1,000 output tokens
Claude 3 Sonnet: $0.00300 per 1,000 input tokens, $0.01500 per 1,000 output tokens
Claude 3 Haiku: $0.00025 per 1,000 input tokens, $0.00125 per 1,000 output tokens
3. Format the information into a JSON object:
{ “Claude Instant”: { “price_per_1000_input_tokens”: 0.00080, “price_per_1000_output_tokens”: 0.00240 }, “Claude 2.0/2.1”: { “price_per_1000_input_tokens”: 0.00800, “price_per_1000_output_tokens”: 0.02400 }, “Claude 3 Sonnet”: { “price_per_1000_input_tokens”: 0.00300, “price_per_1000_output_tokens”: 0.01500 }, “Claude 3 Haiku”: { “price_per_1000_input_tokens”: 0.00025, “price_per_1000_output_tokens”: 0.00125 } }
16944944220886

In third test, I added a few extra guidance to previous prompt.

Create JSON to indicate the Claude model price in US East. Only output the JSON with nothing else. Please think step by step, and keep you thinking between the following tag:
<thinking>
{thinking}
</thinking>
Keep the answer in the following tag:
<answer>
{answer}
</answer>

The response still looks good and more organised, now I can just need to extract the JSON between the <answer></answer> tags, and pass it to downstream if needed. The invocation latency is still low.

ModelResponseInput TokensOutput TokensInvocation Latency (ms)First Byte Lantency (ms)
claude-3-haiku-48k-20240307<think>To create the JSON to indicate the Claude model price in US East, I need to look for the Claude model in the “Anthropic models” column and then find the corresponding price per 1,000 input tokens and price per 1,000 output tokens for the “US East (N. Virginia) and US West (Oregon)” region.
The Claude models listed are: Claude Instant Claude 2.0/2.1 Claude 3 Sonnet Claude 3 Haiku
The price per 1,000 input tokens and price per 1,000 output tokens for the “US East (N. Virginia) and US West (Oregon)” region are listed for each of these models.</think>

<answer>{ “Claude Instant”: { “price_per_1000_input_tokens”: 0.00080, “price_per_1000_output_tokens”: 0.00240 }, “Claude 2.0/2.1”: { “price_per_1000_input_tokens”: 0.00800, “price_per_1000_output_tokens”: 0.02400 }, “Claude 3 Sonnet”: { “price_per_1000_input_tokens”: 0.00300, “price_per_1000_output_tokens”: 0.01500 }, “Claude 3 Haiku”: { “price_per_1000_input_tokens”: 0.00025, “price_per_1000_output_tokens”: 0.00125 } }</answer>
17373923830875

So the conclusion of the test are:

  • Sonnet is smarter but slower than Haiku(1701 ms first byte latency, 7355 ms invocation latency).
  • Haiku offers very low first byte latency (875-886 ms in all 3 tests)
  • Haiku provides a reasonable good response (with prompting engineering) and fast invocation latency (1531 – 4220 ms in all 3 tests)

Leave a comment