mirror of
https://github.com/Wan-Video/Wan2.1.git
synced 2025-06-05 14:54:54 +00:00
Add Dockerfile, Cloud Build config, and deployment guide for Google Cloud
This commit introduces the necessary files and documentation to enable deployment of the Wan video generation application to Google Cloud Platform. Key additions: - Dockerfile: Defines the container image, using an NVIDIA PyTorch base image and including special handling for flash-attn installation. It allows for configurable Gradio application startup. - cloudbuild.yaml: Provides a Google Cloud Build configuration to automate the Docker image build and push process to Google Artifact Registry. - DEPLOY_GCLOUD.md: A comprehensive guide detailing the steps to: - Set up prerequisites on GCP. - Configure and run Cloud Build. - Deploy the container to Cloud Run (with CPU and GPU options). - Deploy the container to Vertex AI Endpoints. - Troubleshoot common issues and select appropriate machine resources. These changes aim to simplify and standardize the deployment process on Google Cloud, addressing potential issues related to dependencies and environment configuration.
This commit is contained in:
parent
e5a741309d
commit
cf8764c186
143
DEPLOY_GCLOUD.md
Normal file
143
DEPLOY_GCLOUD.md
Normal file
@ -0,0 +1,143 @@
|
||||
# Deploying to Google Cloud
|
||||
|
||||
This guide provides instructions on how to build and deploy this application to Google Cloud using Docker, Cloud Build, and Cloud Run or Vertex AI Endpoints.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. **Google Cloud Project:** You have a Google Cloud Project with billing enabled.
|
||||
2. **Enable APIs:** Ensure the following APIs are enabled in your project:
|
||||
* Cloud Build API
|
||||
* Artifact Registry API
|
||||
* Cloud Run API
|
||||
* Vertex AI API (if deploying to Vertex AI Endpoints)
|
||||
* Identity and Access Management (IAM) API
|
||||
3. **`gcloud` CLI:** You have the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) installed and configured.
|
||||
4. **Permissions:** You have sufficient permissions to create Artifact Registry repositories, trigger Cloud Builds, and deploy to Cloud Run/Vertex AI. Roles like `Artifact Registry Administrator`, `Cloud Build Editor`, `Cloud Run Admin`, and `Vertex AI Administrator` are typically needed.
|
||||
5. **Docker configured for gcloud:** Configure Docker to authenticate with Artifact Registry:
|
||||
```bash
|
||||
gcloud auth configure-docker ${_REGION}-docker.pkg.dev
|
||||
```
|
||||
Replace `${_REGION}` with the region of your Artifact Registry (e.g., `us-central1`). You'll define this region in the Cloud Build substitutions.
|
||||
|
||||
## Step 1: Configure Cloud Build Substitutions
|
||||
|
||||
The `cloudbuild.yaml` file uses substitutions for your project ID, Artifact Registry region, repository name, and image name. You need to set these. The easiest way for a manual build is via the `gcloud` command.
|
||||
|
||||
**Important:** Before running the build, decide on:
|
||||
* `YOUR_GCP_PROJECT_ID`: Your actual Google Cloud Project ID.
|
||||
* `YOUR_ARTIFACT_REGISTRY_REGION`: The Google Cloud region for your Artifact Registry (e.g., `us-central1`).
|
||||
* `YOUR_ARTIFACT_REGISTRY_REPO`: The name for your Artifact Registry repository (e.g., `wan-video-repo`).
|
||||
* `YOUR_IMAGE_NAME`: The name for your Docker image (e.g., `wan-video-generator`).
|
||||
|
||||
The `cloudbuild.yaml` has default values for `_REGION`, `_REPOSITORY`, and `_IMAGE_NAME`. You can either:
|
||||
* Modify these defaults directly in `cloudbuild.yaml`.
|
||||
* Override them when submitting the build using the `--substitutions` flag (recommended for flexibility).
|
||||
|
||||
Example of overriding:
|
||||
`--substitutions=PROJECT_ID="your-gcp-project-id",_REGION="us-central1",_REPOSITORY="wan-video-repo",_IMAGE_NAME="wan-video-generator"`
|
||||
|
||||
## Step 2: Create an Artifact Registry Repository
|
||||
|
||||
If you haven't already, create a Docker repository in Artifact Registry:
|
||||
|
||||
```bash
|
||||
gcloud artifacts repositories create YOUR_ARTIFACT_REGISTRY_REPO --repository-format=docker --location=YOUR_ARTIFACT_REGISTRY_REGION --description="Docker repository for Wan video generator"
|
||||
```
|
||||
Ensure `YOUR_ARTIFACT_REGISTRY_REPO` and `YOUR_ARTIFACT_REGISTRY_REGION` match the values you'll use for the `_REPOSITORY` and `_REGION` substitutions in Cloud Build.
|
||||
|
||||
## Step 3: Build and Push the Docker Image
|
||||
|
||||
Submit the build to Google Cloud Build:
|
||||
|
||||
```bash
|
||||
# Ensure you are in the root directory of the project where cloudbuild.yaml is located.
|
||||
# Replace placeholders with your actual values.
|
||||
gcloud builds submit --config cloudbuild.yaml --substitutions=PROJECT_ID="YOUR_GCP_PROJECT_ID",_REGION="YOUR_ARTIFACT_REGISTRY_REGION",_REPOSITORY="YOUR_ARTIFACT_REGISTRY_REPO",_IMAGE_NAME="YOUR_IMAGE_NAME" .
|
||||
```
|
||||
This command uses `cloudbuild.yaml` to build your Docker image and push it to your Artifact Registry.
|
||||
|
||||
## Step 4: Deploy the Container
|
||||
|
||||
You can deploy the container to Cloud Run or Vertex AI Endpoints.
|
||||
|
||||
### Option A: Deploy to Cloud Run
|
||||
|
||||
Cloud Run is suitable for web applications like Gradio interfaces.
|
||||
|
||||
1. **Basic Deployment (CPU only):**
|
||||
|
||||
```bash
|
||||
# Replace placeholders with your values.
|
||||
# YOUR_CLOUD_RUN_REGION should be a region where Cloud Run is available (e.g., us-central1).
|
||||
gcloud run deploy YOUR_IMAGE_NAME --image YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest --platform managed --region YOUR_CLOUD_RUN_REGION --allow-unauthenticated --port 7860 --set-env-vars GRADIO_APP_SCRIPT="gradio/t2v_14B_singleGPU.py" --memory=4Gi --cpu=2
|
||||
# Adjust memory, CPU, and other flags as needed.
|
||||
# For larger models, you will need significantly more memory and potentially more CPU.
|
||||
```
|
||||
* The `--port 7860` matches the `EXPOSE 7860` in the Dockerfile.
|
||||
* Use `--set-env-vars` to specify which Gradio app to run via the `GRADIO_APP_SCRIPT` environment variable.
|
||||
* The image path format is `REGION-docker.pkg.dev/PROJECT_ID/REPOSITORY/IMAGE_NAME:latest`.
|
||||
|
||||
2. **Deployment with GPU:**
|
||||
|
||||
Cloud Run supports GPUs (check availability and machine types in your region).
|
||||
```bash
|
||||
# Replace placeholders. YOUR_CLOUD_RUN_REGION must support GPUs.
|
||||
gcloud beta run deploy YOUR_IMAGE_NAME-gpu --image YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest --platform managed --region YOUR_CLOUD_RUN_REGION --allow-unauthenticated --port 7860 --set-env-vars GRADIO_APP_SCRIPT="gradio/t2v_14B_singleGPU.py" --memory=16Gi --cpu=4 --execution-environment gen2 --args=--machine-type=a2-highgpu-1g # Or other suitable GPU machine types.
|
||||
# Consult Cloud Run documentation for current GPU options.
|
||||
```
|
||||
|
||||
3. **Considerations for Cloud Run:**
|
||||
* **Timeout:** Default request timeout is 5 minutes. Increase if needed (max 60 minutes for Gen2).
|
||||
* **Concurrency:** Adjust based on instance capacity.
|
||||
* **Model Loading & Storage:**
|
||||
* Models are currently packaged in the Docker image. This increases image size.
|
||||
* For very large models, consider downloading them at startup from Google Cloud Storage (GCS) into the container. This would require modifying the Docker `CMD` or `ENTRYPOINT`.
|
||||
|
||||
### Option B: Deploy to Vertex AI Endpoints
|
||||
|
||||
Vertex AI Endpoints are better for dedicated ML model serving and offer more powerful hardware options.
|
||||
|
||||
1. **Create an Endpoint:**
|
||||
```bash
|
||||
# Replace placeholders. YOUR_VERTEX_AI_REGION is e.g., us-central1.
|
||||
gcloud ai endpoints create --project=YOUR_GCP_PROJECT_ID --region=YOUR_VERTEX_AI_REGION --display-name="wan-video-endpoint"
|
||||
```
|
||||
Note the `ENDPOINT_ID` from the output.
|
||||
|
||||
2. **Deploy the model (container) to the Endpoint:**
|
||||
```bash
|
||||
# Replace placeholders.
|
||||
# ENDPOINT_ID is from the previous command.
|
||||
# MACHINE_TYPE can be n1-standard-4, or a GPU type like a2-highgpu-1g.
|
||||
gcloud ai models deploy wan-video-model --project=YOUR_GCP_PROJECT_ID --region=YOUR_VERTEX_AI_REGION --endpoint=ENDPOINT_ID --display-name="v1" --container-image-uri="YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest" --machine-type=MACHINE_TYPE # --container-env-vars="GRADIO_APP_SCRIPT=gradio/t2v_14B_singleGPU.py" # If serving Gradio UI
|
||||
# --container-ports=7860 # If serving Gradio UI
|
||||
# For dedicated prediction (non-Gradio), you'd typically implement /predict and /health routes.
|
||||
# --container-predict-route="/predict"
|
||||
# --container-health-route="/health"
|
||||
```
|
||||
* Vertex AI is more commonly used for direct prediction endpoints. If serving a Gradio UI, ensure networking and port configurations are appropriate. Accessing a UI might require additional setup (e.g., IAP).
|
||||
|
||||
## Step 5: Accessing your Deployed Application
|
||||
|
||||
* **Cloud Run:** The `gcloud run deploy` command will output a service URL.
|
||||
* **Vertex AI Endpoint:** Typically accessed programmatically via SDK or REST API.
|
||||
|
||||
## Step 6: Checking Logs for Troubleshooting
|
||||
|
||||
* **Cloud Build Logs:** Google Cloud Console > Cloud Build > History.
|
||||
* **Cloud Run Logs:** Google Cloud Console > Cloud Run > Select Service > Logs tab.
|
||||
* **Vertex AI Endpoint Logs:** Google Cloud Console > Vertex AI > Endpoints > Select Endpoint > View Logs. Also available in Cloud Logging.
|
||||
|
||||
Look for errors related to dependency installation, model loading, resource limits (memory/CPU), or port configurations.
|
||||
|
||||
## Choosing Machine Types and Resources
|
||||
|
||||
* **CPU/Memory:** The 14B models are very demanding. Start with high CPU/memory (e.g., 4+ vCPUs, 16GB+ RAM) and scale up.
|
||||
* **GPU:** Essential for 14B models and highly recommended for 1.3B models.
|
||||
* **Cloud Run:** Gen2 execution environment with GPU machine types.
|
||||
* **Vertex AI:** Offers a wide variety of GPUs (T4, V100, A100).
|
||||
* **Model Sizes & Compatibility:**
|
||||
* The `nvcr.io/nvidia/pytorch:24.03-py3` base image uses CUDA 12.1. Ensure chosen GPUs are compatible (e.g., Ampere, Hopper).
|
||||
* For 14B models, you'll likely need GPUs with large VRAM (e.g., A100 40GB or 80GB). Check the model's specific requirements.
|
||||
|
||||
This guide provides a starting point. You may need to adjust configurations based on the specific model you are deploying and your performance requirements.
|
48
Dockerfile
Normal file
48
Dockerfile
Normal file
@ -0,0 +1,48 @@
|
||||
# Use a PyTorch base image with CUDA support
|
||||
# Check NVIDIA's NGC catalog or Docker Hub for suitable images
|
||||
# Example: nvcr.io/nvidia/pytorch:24.03-py3 (PyTorch 2.4.0, Python 3.10, CUDA 12.1)
|
||||
# Ensure the Python version in the base image is compatible with project dependencies.
|
||||
# Python 3.10 is generally a safe bet for recent PyTorch versions.
|
||||
FROM nvcr.io/nvidia/pytorch:24.03-py3
|
||||
|
||||
# Set the working directory
|
||||
WORKDIR /app
|
||||
|
||||
# Install essential build tools and Python headers (if needed for some packages)
|
||||
# RUN apt-get update && apt-get install -y --no-install-recommends # build-essential # python3-dev # && rm -rf /var/lib/apt/lists/*
|
||||
|
||||
# Copy the requirements file first to leverage Docker layer caching
|
||||
COPY requirements.txt .
|
||||
|
||||
# Install Python dependencies
|
||||
# Upgrading pip, setuptools, and wheel first can prevent some build issues.
|
||||
RUN pip install --upgrade pip setuptools wheel
|
||||
|
||||
# Special handling for flash-attn as per INSTALL.md
|
||||
# Attempting the no-build-isolation method first.
|
||||
# Ensure that the PyTorch and CUDA versions are compatible with flash-attn
|
||||
RUN pip install flash-attn --no-build-isolation
|
||||
|
||||
# Install other dependencies
|
||||
# Using --no-cache-dir to reduce image size
|
||||
RUN pip install --no-cache-dir -r requirements.txt
|
||||
|
||||
# Copy the rest of the application code
|
||||
COPY . .
|
||||
|
||||
# Make generate.py executable (if it's intended to be run directly and has a shebang)
|
||||
# RUN chmod +x generate.py
|
||||
|
||||
# Set environment variables (optional, can be overridden at runtime)
|
||||
# Example: Set a default Gradio app to run
|
||||
ENV GRADIO_APP_SCRIPT gradio/t2v_14B_singleGPU.py
|
||||
# ENV MODEL_CKPT_DIR ./Wan2.1-T2V-14B # Or a path inside the container where models will be mounted/downloaded
|
||||
|
||||
# Expose the default Gradio port (usually 7860)
|
||||
EXPOSE 7860
|
||||
|
||||
# Default command to run the Gradio application
|
||||
# This assumes the Gradio apps are launched with `python <script_name>`
|
||||
# Users might need to adjust this based on how they want to run their app.
|
||||
# Using `sh -c` to allow environment variable substitution in the command.
|
||||
CMD ["sh", "-c", "python $GRADIO_APP_SCRIPT"]
|
43
cloudbuild.yaml
Normal file
43
cloudbuild.yaml
Normal file
@ -0,0 +1,43 @@
|
||||
steps:
|
||||
- name: 'gcr.io/cloud-builders/docker'
|
||||
args: [
|
||||
'build',
|
||||
'-t',
|
||||
'${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}',
|
||||
'-t',
|
||||
'${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest',
|
||||
'.'
|
||||
]
|
||||
id: 'Build Docker image'
|
||||
|
||||
- name: 'gcr.io/cloud-builders/docker'
|
||||
args: ['push', '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}']
|
||||
id: 'Push image to Artifact Registry (tagged with commit SHA)'
|
||||
|
||||
- name: 'gcr.io/cloud-builders/docker'
|
||||
args: ['push', '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest']
|
||||
id: 'Push image to Artifact Registry (tagged as latest)'
|
||||
|
||||
# Substitutions (users should replace these with their actual values or set them during build trigger)
|
||||
# _REGION: The region of your Artifact Registry repository (e.g., us-central1)
|
||||
# _REPOSITORY: The name of your Artifact Registry repository (e.g., my-app-repo)
|
||||
# _IMAGE_NAME: The name for your Docker image (e.g., wan-video-generator)
|
||||
|
||||
# Default values for substitutions (can be overridden)
|
||||
substitutions:
|
||||
_REGION: 'us-central1' # Replace with your Artifact Registry region
|
||||
_REPOSITORY: 'my-app-images' # Replace with your Artifact Registry repository name
|
||||
_IMAGE_NAME: 'wan-video-service' # Replace with your desired image name
|
||||
|
||||
# Optional: specify image to be pushed to Artifact Registry
|
||||
# This allows the image to be displayed in the build summary in Google Cloud Console
|
||||
images:
|
||||
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}'
|
||||
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest'
|
||||
|
||||
options:
|
||||
logging: CLOUD_LOGGING_ONLY
|
||||
# Optional: Specify machine type for faster builds (e.g., for GPU builds if needed by Dockerfile)
|
||||
# machineType: 'E2_HIGHCPU_8' # or 'N1_HIGHCPU_32'
|
||||
# For GPU builds during the Docker build process (if Dockerfile uses GPU):
|
||||
# machineType: 'N1_HIGHCPU_8_CLOUD_GPU_P100_1' # Example, check available types
|
Loading…
Reference in New Issue
Block a user