Kserve를 통해 모델 배포하기

위 링크를 참고하여 Kserve를 설치하고 모델을 배포해보자. 아래 내용은 위 링크를 참고하여 작성되었다.

Kserve 설치

aimlfw-dep 디렉토리로 가서
```
 bin/install-kserve.sh
```

QoE 모델 배포

먼저 네임스페이스를 생성한다.

 kubectl create namespace kserve-test

qoe.yaml 파일을 생성해서 아래 내용을 넣어준다.

 apiVersion: "serving.kserve.io/v1beta1"
 kind: "InferenceService"
 metadata:
   name: qoe-model
 spec:
   predictor:
     tensorflow:
       storageUri: "<update Model URL here>"
       runtimeVersion: "2.5.1"
       resources:
         requests:
           cpu: 0.1
           memory: 0.5Gi
         limits:
           cpu: 0.1
           memory: 0.5Gi

storageUri에 모델 URL을 넣을 때 localhost 대신 tm.traininghost를 넣어주자.

나는 localhost로 했을 때 storage-initializer에서 connection refused 오류가 발생했다.

 kubectl apply -f qoe.yaml -n kserve-test

그런데 아마 kserve-test에 pod가 생기지 않았을 것이다.

 kubectl describe inferenceservice qoe-model -n kserve-test

로 오류를 확인해보자.

 no runtime found to support predictor with model type: {tensorflow <nil>}

이런 메시지가 뜨는데, runtime을 찾을 수 없다는 오류가 뜬다.

Kserve github repository 로 가 kserve-cluster-resources.yaml 로 runtime을 설치해주자.

나는 최신 버전인 v0.12.1을 사용했다.

 kubectl apply -f https://raw.githubusercontent.com/kserve/kserve/master/install/v0.12.1/kserve-cluster-resources.yaml

runtime이 잘 올라갔는지 확인해보자.

 kubectl get clusterservingruntimes

이제 다시 모델을 배포해보자.

우선 기존 qoe-model을 삭제한다.

 kubectl delete inferenceservice qoe-model -n kserve-test

그리고 다시 모델을 배포한다.

 kubectl apply -f qoe.yaml -n kserve-test

확인해보면

 $ kubectl get pods -n kserve-test

 NAME                                                    READY   STATUS    RESTARTS   AGE
 qoe-model-predictor-00001-deployment-6d895c7889-x8rhc   2/2     Running   0          45m

 $ kubectl get inferenceservice qoe-model -n kserve-test

 NAME        URL                                              READY   PREV   LATEST   PREVROLLEDOUTREVISION   LATESTREADYREVISION         AGE
 qoe-model   http://qoe-model.kserve-test.svc.cluster.local   True           100                              qoe-model-predictor-00001   47m

이런 식으로 pod가 잘 올라간 것을 확인할 수 있다.

예측

input_qoe.json 파일을 생성해서 아래 내용을 넣어준다.

 {"signature_name": "serving_default", "instances": [[[2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56],
       [2.56, 2.56]]]}

predict.sh 파일을 생성해서 아래 내용을 넣어준다.

 model_name=qoe-model
 file_name=input_qoe.json
 NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
 NODE_PORT=$(kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}')
 curl -v -H "Host: $model_name.kserve-test.svc.cluster.local" http://$NODE_IP:$NODE_PORT/v1/models/$model_name:predict -d @./$file_name

Port는

 $ kubectl get svc istio-ingressgateway -n istio-system

 NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)                                      AGE
istio-ingressgateway   LoadBalancer   10.105.16.87   <pending>     15021:32342/TCP,80:31582/TCP,443:32400/TCP   3h31m

를 했을 때 나오는 PORT(S) 중에 80에 연결된 포트이다. 여기서는 31582.

IP는 똑같이 kubectl get svc istio-ingressgateway -n istio-system 를 했을 때
EXTERNAL-IP가 있으면 해당 IP를 사용하고,
없으면 kubectl get nodes -o wide 를 했을 때 나오는 INTERNAL-IP를 사용한다.

 $ source predict.sh

 *   Trying 192.168.0.6:31582...
 * Connected to 192.168.0.6 (192.168.0.6) port 31582
 > POST /v1/models/qoe-model:predict HTTP/1.1
 > Host: qoe-model.kserve-test.svc.cluster.local
 > User-Agent: curl/8.7.1
 > Accept: */*
 > Content-Length: 248
 > Content-Type: application/x-www-form-urlencoded
 >
 * upload completely sent off: 248 bytes
 < HTTP/1.1 200 OK
 < content-length: 54
 < content-type: application/json
 < date: Tue, 27 Aug 2024 05:27:47 GMT
 < x-envoy-upstream-service-time: 5
 < server: istio-envoy
 <
 {
     "predictions": [[306.623444, 308.588318]
     ]
 * Connection #0 to host 192.168.0.6 left intact
 }

모델 학습 pipeline 구축하기

Usecase: UAV Path Prediction