I deployed HuggingFace zephyr-7b-beta model to SageMaker by using the default deploy.py script. When trying to invoke the model endpoint, I received the error “ValueError: Error raised by inference endpoint: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message “{“error”:”Input validation error: inputs must have less than 1024 tokens.”
It turns out that the default max input length is 1024 tokens. To increase it, I need to pass the MAX_INPUT_LENGTH parameter in the env setting. Here is the full code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"inputs": "<|system|>\nYou are a pirate chatbot who always responds with Arr!</s>\n<|user|>\nThere's a llama on my lawn, how can I get rid of him?</s>\n<|assistant|>\n",