Increase max input length for HuggingFace model in SageMaker deployment

I deployed HuggingFace zephyr-7b-beta model to SageMaker by using the default deploy.py script. When trying to invoke the model endpoint, I received the error “ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message “{“error”:”Input validation error: inputs must have less than 1024 tokens.”

It turns out that the default max input length is 1024 tokens. To increase it, I need to pass the MAX_INPUT_LENGTH parameter in the env setting. Here is the full code

	import json
	import sagemaker
	import boto3
	from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

	try:
	role = sagemaker.get_execution_role()
	except ValueError:
	iam = boto3.client('iam')
	role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

	# Hub Model configuration. https://huggingface.co/models
	hub = {
	'HF_MODEL_ID':'HuggingFaceH4/zephyr-7b-beta',
	'SM_NUM_GPUS': json.dumps(1),
	'MAX_TOTAL_TOKENS': json.dumps(4096),
	'MAX_INPUT_LENGTH': json.dumps(3000),
	}


	# create Hugging Face Model Class
	huggingface_model = HuggingFaceModel(
	image_uri=get_huggingface_llm_image_uri("huggingface",version="1.1.0"),
	env=hub,
	role=role,

	)

	# deploy model to SageMaker Inference
	predictor = huggingface_model.deploy(
	initial_instance_count=1,
	instance_type="ml.g5.2xlarge",
	container_startup_health_check_timeout=300,
	)

	# send request
	predictor.predict({
	"inputs": "<\|system\|>\nYou are a pirate chatbot who always responds with Arr!</s>\n<\|user\|>\nThere's a llama on my lawn, how can I get rid of him?</s>\n<\|assistant\|>\n",
	})

view raw deploy_zephyr_7b_beta_in_sagemaker.py hosted with ❤ by GitHub

Reference: https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher#maxinputlength

	Unleashing the Power… on Image-Reader: A project to exp…
	Bob on Build docker image with kaniko…
	Voces De La Tierra on Puppet for Windows: Remote…
	Use Amazon Q with Co… on Use Amazon CodeWhisperer for…
	Zigya on Mail for Exchange on E72
	Masking PII Image wi… on Mask Words in Image
	Use ChatGPT to check… on Why you need CodeGuru?
	AWS Config Advanced… on AWS Config Advance Queries aga…
	Gene on Fail to quiesce a virtual mach…
	Elisa Caldwell on TSM

Increase max input length for HuggingFace model in SageMaker deployment

Published by Jackie Chen

Leave a comment Cancel reply

Share this:

Related

Published by Jackie Chen

Leave a comment Cancel reply