Hello, I am running the cricket project in Nanos to call remote GPU resources in Unikernel, but a fatal error has occurred and I don’t know how to solve it. Here is my detailed build process.
-
Compile cricket
git clone https://github.com/RWTH-ACS/cricket.git cd cricket && git submodule update --init LOG=INFO make
The build product is located in the
bin
directory -
Build a Nanos image
The project is structured as follows:. ├── bin │ ├── cricket-client.so │ ├── cricket-rpc-server │ ├── cricket-server.so │ ├── libtirpc.so │ ├── libtirpc.so.3 │ └── tests │ ├── api.testapp │ ├── bandwidthTest.sample │ ├── cpu.testapp │ ├── cricket.testapp │ ├── kernel.testapp │ ├── matrixMul.compressed.sample │ ├── matrixMul.uncompressed.sample │ ├── mnistCUDNN.sample │ ├── nbody.compressed.sample │ ├── nbody.uncompressed.sample │ ├── test_list.test │ └── test_resource_mg.test ├── config.json ├── etc │ └── netconfig ├── lib │ └── x86_64-linux-gnu │ ├── cricket-client.so │ ├── libcudart.so.12 │ ├── libdl.so.2 │ ├── libelf.so.1 │ ├── librt.so.1 │ ├── libtirpc.so.3 │ └── libz.so.1 ├── proc │ └── self │ └── comm └── start.sh
config.json
{ "MapDirs":{ "./etc/*":"/etc", "./lib/*":"/usr/lib", "./proc/*":"/proc" }, "Env":{ "REMOTE_GPU_ADDRESS":"192.168.1.63", "LD_PRELOAD":"/usr/lib/x86_64-linux-gnu/cricket-client.so" } }
start.sh
appName='cricket.testapp' echo $appName > ./proc/self/comm ops run bin/tests/$appName -c config.json -b -t tap0 --ip-address 192.168.1.166
-
Run the program
Run the cricket server on a server with a Nvidia GPU and CUDA environment./bin/cricket-rpc-server
welcome to cricket! +08:00:00.000004 INFO: using TCP... +08:00:00.056793 INFO: listening on port 64172 +08:00:00.323820 INFO: waiting for RPC requests...
Run Nanos instance
./start.sh
And then there was a mistake.
running local instance booting /root/.ops/images/cricket ... en1: assigned 192.168.1.166 +00:00:00.000389 INFO: connection to host "192.168.1.63" +00:00:00.011436 INFO: connecting via TCP... en1: assigned FE80::64AC:D2FF:FE0E:B2EE *** signal 11 received by tid 2, errno 0, code 1 fault address 0x0 *** Thread context: lastvector: 000000000000000e (Page fault) frame: ffffc00002a02000 type: thread active_cpu: 00000000ffffffff stack top: 0000000000000000 error code: 0000000000000004 address: 0000000000000000 rax: 0000000000000000 rbx: 0000000000676c40 rcx: 0000000000000000 rdx: 0000000000000001 rsi: 00000000000026d0 rdi: 0000000000674570 rbp: 0000000000000001 rsp: 00000000ffd7e9b0 r8: 0000000000000000 r9: 0000000000000001 r10: fffffffffffff8fa r11: 0000008c21f68c70 r12: 00000000000026d0 r13: 0000000000000000 r14: 00000000ffd7eda8 r15: 0000000000679cd8 rip: 0000008c21f68c95 rflags: 0000000000010202 ss: 000000000000002b cs: 0000000000000023 ds: 0000000000000000 es: 0000000000000000 fsbase: 0000000100b61000 gsbase: 0000000000000000 frame trace: loaded klibs: stack trace: 00000000ffd7e9b0: 0000000000679cd8 00000000ffd7e9b8: 0000000000676c40 00000000ffd7e9c0: 0000000000674570 00000000ffd7e9c8: 00000000000026d0 00000000ffd7e9d0: 0000000000000000 00000000ffd7e9d8: 00000000ffd7eda8 00000000ffd7e9e0: 0000000000679cd8 00000000ffd7e9e8: 000000f117489f50 00000000ffd7e9f0: 00000000ffd7ea54 00000000ffd7e9f8: 00000039e87aee56 00000000ffd7ea00: 1c000080e87bbf60 00000000ffd7ea08: 1dcd29f6f5a0d200 00000000ffd7ea10: 000000f1174f4c20 00000000ffd7ea18: 1dcd29f6f5a0d200 00000000ffd7ea20: 00000000ffd7eb50 00000000ffd7ea28: 00000039e87adba9 00000000ffd7ea30: 0000000000000000 00000000ffd7ea38: 1dcd29f6f5a0d200 00000000ffd7ea40: 00000000ffd7eb48 00000000ffd7ea48: 00000039e87ae387 00000000ffd7ea50: 0000000000000000 00000000ffd7ea58: 0000000000000000 00000000ffd7ea60: 00000000ffd7eb58 00000000ffd7ea68: 00000039e87adb19 00000000ffd7ea70: 0000000000000000 00000000ffd7ea78: 000000000097c8c0 00000000ffd7ea80: 000000000097c928 00000000ffd7ea88: 00000039e87bbf60 00000000ffd7ea90: 00000000ffd7eb40 00000000ffd7ea98: 00000039e87adab0 00000000ffd7eaa0: 00000000ffd7eb30 00000000ffd7eaa8: 00000039e87a18c9 core dump