I deployed HuggingFace zephyr-7b-beta model to SageMaker by using the default deploy.py script. When trying to invoke the model endpoint, I received the error “ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message “{“error”:”Input validation error: inputs
must have less than 1024 tokens.”
It turns out that the default max input length is 1024 tokens. To increase it, I need to pass the MAX_INPUT_LENGTH parameter in the env setting. Here is the full code
Reference: https://huggingface.co/docs/text-generation-inference/basic_tutorials/launcher#maxinputlength