Run pytorch inference on nanovm

sara · July 17, 2025, 3:02pm

I’m trying to run pytorch inference on nanovm (that requires another python packages such transformers..),

I’m trying to run from my python environment- venv, with the needed packages,

my config file is:

{
  "Env": {
    "HOME": "/",
    "PYTHONDONTWRITEBYTECODE": "1",
    "USER": "root",
    "TRANSFORMERS_CACHE": "/model-cache",
    "HF_HOME": "/model-cache",
    "TOKENIZERS_PARALLELISM": "false",
    "OMP_NUM_THREADS": "1",
    "MKL_NUM_THREADS": "1",
    "PYTORCH_NO_CUDA_MEMORY_CACHING": "1",
    "SSL_CERT_FILE": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem",
    "REQUESTS_CA_BUNDLE": "/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem"
  },
  "MapDirs": {
    "./venv_3_8/*": "/venv_3_8"
    },
  "Args": [
    "/venv_3_8/bin/python3.8",
    "./llama3_inference.py"
    ]
}

The command is:
ops pkg load eyberg/pytorch:2.1.1 -c config.json --nanos-version=0.1.52
I have venv_3_8 directory in my app directory which I run the ops command from,

But I get an error:
SyntaxError: Non-UTF-8 code starting with ‘\xe2’ in file /venv_3_8/bin/python3.8 on line 2, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

It seems that the nanoVM does not recognize the python exec file…

how can I solve it and run correctly from my venv with the needed packages?

(the python inference works well in the local environment (without nanoVM))

thanks

eyberg · July 17, 2025, 3:23pm

Hi Sara - thanks for moving this over.

As I mentioned in that other thread the ‘\xe2’ error you are getting is a multibyte issue in on of your source files that it’s not recognizing. Sometimes this can happen because of copy/paste.

Is it possible to see a copy of your llama3_inference.py or something reproducible - perhaps a github repo with the code showing this?

sara · July 20, 2025, 7:21am

# llama3_inference.py

import  sys

import torch

import warnings

from transformers import AutoTokenizer, AutoModelForCausalLM

# Suppress warnings for cleaner output
warnings.filterwarnings("ignore")

def main():
    print("Starting TinyLlama inference on NanoVM...")
    
    # Model configuration
    model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
    
    try:
        print("Loading tokenizer...")
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        print("Loading model...")
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float32,  # Use float32 for CPU compatibility
            device_map="cpu",           # Force CPU usage
            low_cpu_mem_usage=True,     # Optimize memory usage
            trust_remote_code=True
        )
        
        print("Model loaded successfully!")
        
        # Test prompt
        prompt = "<|system|>\nYou are a helpful assistant.\n<|user|>\nwhat is the capital of England?\n<|assistant|>\n"
        
        print(f"Input prompt: {prompt}")
        print("Running inference...")
        
        # Tokenize
        inputs = tokenizer(prompt, return_tensors="pt")
        
        # Generate
        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_new_tokens=50,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.eos_token_id
            )
        
        # Decode output
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print("=" * 60)
        print("RESPONSE:")
        print(response)
        print("=" * 60)
        
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)
    
    print("Inference completed successfully!")

if __name__ == "__main__":
    main()

sara · July 20, 2025, 11:47am

This error is reproduced also with simple “hello world” printing script.

eyberg · July 20, 2025, 2:53pm

If you’re reproducing this in just a simple hello world that’d be easiest to diagnose. I don’t see that in what you pasted which means it’s probably in your .venv.

What kind of locale do you have locally and is your host linux/mac/windows?

You can find the locale by running this for example:

➜  ~ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

If you throw a reproducible hello world into a github repo we could probably find what you are running into. You can also try running something like this https://stackoverflow.com/questions/21639275/python-syntaxerror-non-ascii-character-xe2-in-file to hunt down what file is causing this.

sara · July 21, 2025, 5:49am

I get the error only in the nanoVM, when I run the python script on my host, it works well.
In addition, the file that the error massage mentioned is the python exec file (/venv_3_8/bin/python3.8), not the script file. (the error occurs also when I try to run /venv_3_8/bin/python3.8 --version - without python script), so it means that the problem is not in the python script.
I think that the nanoVM does not succeed to parse the python exec bin file correctly.

My host is linux.
I get the error in both outputs:
~ locale
LANG=en_IL
LC_CTYPE=“en_IL”
LC_NUMERIC=“en_IL”
LC_TIME=“en_IL”
LC_COLLATE=“en_IL”
LC_MONETARY=“en_IL”
LC_MESSAGES=“en_IL”
LC_PAPER=“en_IL”
LC_NAME=“en_IL”
LC_ADDRESS=“en_IL”
LC_TELEPHONE=“en_IL”
LC_MEASUREMENT=“en_IL”
LC_IDENTIFICATION=“en_IL”
LC_ALL=

and also when I change it to:
~ locale
LANG=en_US.UTF-8
LC_CTYPE=“en_US.UTF-8”
LC_NUMERIC=“en_US.UTF-8”
LC_TIME=“en_US.UTF-8”
LC_COLLATE=“en_US.UTF-8”
LC_MONETARY=“en_US.UTF-8”
LC_MESSAGES=“en_US.UTF-8”
LC_PAPER=“en_US.UTF-8”
LC_NAME=“en_US.UTF-8”
LC_ADDRESS=“en_US.UTF-8”
LC_TELEPHONE=“en_US.UTF-8”
LC_MEASUREMENT=“en_US.UTF-8”
LC_IDENTIFICATION=“en_US.UTF-8”
LC_ALL=en_US.UTF-8

Simple python script that reproduces the error when I run it on the nanoVM:
print("hello world")

thanks

eyberg · July 21, 2025, 1:38pm

I see - you’re passing the python interpreter as the first argument but the package itself already sets it (if you look in ~/.ops/packages/amd64/eyberg/pytorch_2.1.1/package.manifest)

"Args": ["python3"],

What you can do here is simply omit the

    "/venv_3_8/bin/python3.8",

in your config. If you need python3.8 you’d need to replace what’s in the package as it’s using python 3.10.

sara · July 22, 2025, 6:53am

Also if I run with python3.10 and in my config set:

  "Args": [
    "python3.10",
    "/llama3_inference.py"
    ]

I get the error:
SyntaxError: Non-UTF-8 code starting with ‘\x80’ in file //python3 on line 2, but no encoding declared; see PEP 263 – Defining Python Source Code Encodings | peps.python.org for details

and if I run without the args setting in the config file, (use the default python interpreter), I get the error:
Traceback (most recent call last):
File “//llama3_inference.py”, line 9, in
from transformers import AutoTokenizer, AutoModelForCausalLM
ModuleNotFoundError: No module named ‘transformers’

I need ‘transformers’ module in my script, so that it the reason that I want to run from my venv with the needed modules.

francescolavra · July 22, 2025, 7:28am

As @eyberg said, you must not insert the name of the Python interpreter in the “Args” array: if you want to run the llama3_inference.py script, your Args should be:

  "Args": [
    "/llama3_inference.py"
    ]

If you want to run the Python interpreter from your venv, you cannot use the pytorch:2.1.1 package as is, because that includes its own Python interpreter. You should either create your local package which will be a modified version of the the pytorch:2.1.1 package (see Packages | Ops), or do without packages altogether, in which case you would have to re-create the entire environment including your Python interpreter and all needed modules via the “Dirs” and “Files” directives (see Configuration | Ops).

sara · July 24, 2025, 11:37am

At this point, since I’m not converging toward a working solution, I’ve decided to put this task on hold for now.
If I have any updates or progress on this task in the future, I’ll make sure to share them here.

Thanks for your time and assistance.

Topic		Replies	Views
Abnormal termination of running CUDA program	3	194	March 4, 2024
Question on bootup messages	7	293	September 20, 2024
Arm64 golang hello-wold fails to run on MacBook Air M1	19	389	June 17, 2024
Package not found issue when executing ops	4	658	September 20, 2022
How to Effectively Utilize NanoVMs for My Projects?	1	90	October 23, 2024

Run pytorch inference on nanovm

Related topics