Run pytorch inference on nanovm

I’m trying to run pytorch inference on nanovm (that requires another python packages such transformers..),

I’m trying to run from my python environment- venv, with the needed packages,

my config file is:

{
  "Env": {
    "HOME": "/",
    "PYTHONDONTWRITEBYTECODE": "1",
    "USER": "root",
    "TRANSFORMERS_CACHE": "/model-cache",
    "HF_HOME": "/model-cache",
    "TOKENIZERS_PARALLELISM": "false",
    "OMP_NUM_THREADS": "1",
    "MKL_NUM_THREADS": "1",
    "PYTORCH_NO_CUDA_MEMORY_CACHING": "1",
    "SSL_CERT_FILE": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem",
    "REQUESTS_CA_BUNDLE": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"
  },
  "MapDirs": {
    "./venv_3_8/*": "/venv_3_8"
    },
  "Args": [
    "/venv_3_8/bin/python3.8",
    "./llama3_inference.py"
    ]
}

The command is:
ops pkg load eyberg/pytorch:2.1.1 -c config.json --nanos-version=0.1.52
I have venv_3_8 directory in my app directory which I run the ops command from,

But I get an error:
SyntaxError: Non-UTF-8 code starting with ‘\xe2’ in file /venv_3_8/bin/python3.8 on line 2, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

It seems that the nanoVM does not recognize the python exec file…

how can I solve it and run correctly from my venv with the needed packages?

(the python inference works well in the local environment (without nanoVM))

thanks

Hi Sara - thanks for moving this over.

As I mentioned in that other thread the ‘\xe2’ error you are getting is a multibyte issue in on of your source files that it’s not recognizing. Sometimes this can happen because of copy/paste.

Is it possible to see a copy of your llama3_inference.py or something reproducible - perhaps a github repo with the code showing this?

# llama3_inference.py

import  sys

import torch

import warnings

from transformers import AutoTokenizer, AutoModelForCausalLM

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore")

def main():
    print("Starting TinyLlama inference on NanoVM...")
    
    # Model configuration
    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
    
    try:
        print("Loading tokenizer...")
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        print("Loading model...")
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float32,  # Use float32 for CPU compatibility
            device_map="cpu",           # Force CPU usage
            low_cpu_mem_usage=True,     # Optimize memory usage
            trust_remote_code=True
        )
        
        print("Model loaded successfully!")
        
        # Test prompt
        prompt = "<|system|>\nYou are a helpful assistant.\n<|user|>\nwhat is the capital of England?\n<|assistant|>\n"
        
        print(f"Input prompt: {prompt}")
        print("Running inference...")
        
        # Tokenize
        inputs = tokenizer(prompt, return_tensors="pt")
        
        # Generate
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=50,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
        
        # Decode output
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print("=" * 60)
        print("RESPONSE:")
        print(response)
        print("=" * 60)
        
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)
    
    print("Inference completed successfully!")

if __name__ == "__main__":
    main()

This error is reproduced also with simple “hello world” printing script.

If you’re reproducing this in just a simple hello world that’d be easiest to diagnose. I don’t see that in what you pasted which means it’s probably in your .venv.

What kind of locale do you have locally and is your host linux/mac/windows?

You can find the locale by running this for example:

➜  ~ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

If you throw a reproducible hello world into a github repo we could probably find what you are running into. You can also try running something like this https://stackoverflow.com/questions/21639275/python-syntaxerror-non-ascii-character-xe2-in-file to hunt down what file is causing this.

I get the error only in the nanoVM, when I run the python script on my host, it works well.
In addition, the file that the error massage mentioned is the python exec file (/venv_3_8/bin/python3.8), not the script file. (the error occurs also when I try to run /venv_3_8/bin/python3.8 --version - without python script), so it means that the problem is not in the python script.
I think that the nanoVM does not succeed to parse the python exec bin file correctly.

My host is linux.
I get the error in both outputs:
~ locale
LANG=en_IL
LC_CTYPE=“en_IL”
LC_NUMERIC=“en_IL”
LC_TIME=“en_IL”
LC_COLLATE=“en_IL”
LC_MONETARY=“en_IL”
LC_MESSAGES=“en_IL”
LC_PAPER=“en_IL”
LC_NAME=“en_IL”
LC_ADDRESS=“en_IL”
LC_TELEPHONE=“en_IL”
LC_MEASUREMENT=“en_IL”
LC_IDENTIFICATION=“en_IL”
LC_ALL=

and also when I change it to:
~ locale
LANG=en_US.UTF-8
LC_CTYPE=“en_US.UTF-8”
LC_NUMERIC=“en_US.UTF-8”
LC_TIME=“en_US.UTF-8”
LC_COLLATE=“en_US.UTF-8”
LC_MONETARY=“en_US.UTF-8”
LC_MESSAGES=“en_US.UTF-8”
LC_PAPER=“en_US.UTF-8”
LC_NAME=“en_US.UTF-8”
LC_ADDRESS=“en_US.UTF-8”
LC_TELEPHONE=“en_US.UTF-8”
LC_MEASUREMENT=“en_US.UTF-8”
LC_IDENTIFICATION=“en_US.UTF-8”
LC_ALL=en_US.UTF-8

Simple python script that reproduces the error when I run it on the nanoVM:
print("hello world")

thanks

I see - you’re passing the python interpreter as the first argument but the package itself already sets it (if you look in ~/.ops/packages/amd64/eyberg/pytorch_2.1.1/package.manifest)

"Args": ["python3"],

What you can do here is simply omit the

    "/venv_3_8/bin/python3.8",

in your config. If you need python3.8 you’d need to replace what’s in the package as it’s using python 3.10.

Also if I run with python3.10 and in my config set:

  "Args": [
    "python3.10",
    "/llama3_inference.py"
    ]

I get the error:
SyntaxError: Non-UTF-8 code starting with ‘\x80’ in file //python3 on line 2, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

and if I run without the args setting in the config file, (use the default python interpreter), I get the error:
Traceback (most recent call last):
File “//llama3_inference.py”, line 9, in
from transformers import AutoTokenizer, AutoModelForCausalLM
ModuleNotFoundError: No module named ‘transformers’

I need ‘transformers’ module in my script, so that it the reason that I want to run from my venv with the needed modules.

As @eyberg said, you must not insert the name of the Python interpreter in the “Args” array: if you want to run the llama3_inference.py script, your Args should be:

  "Args": [
    "/llama3_inference.py"
    ]

If you want to run the Python interpreter from your venv, you cannot use the pytorch:2.1.1 package as is, because that includes its own Python interpreter. You should either create your local package which will be a modified version of the the pytorch:2.1.1 package (see Packages | Ops), or do without packages altogether, in which case you would have to re-create the entire environment including your Python interpreter and all needed modules via the “Dirs” and “Files” directives (see Configuration | Ops).

At this point, since I’m not converging toward a working solution, I’ve decided to put this task on hold for now.
If I have any updates or progress on this task in the future, I’ll make sure to share them here.

Thanks for your time and assistance.