mirror of
				https://github.com/Wan-Video/Wan2.1.git
				synced 2025-11-04 06:15:17 +00:00 
			
		
		
		
	Merge 0543adc653 into 7c81b2f27d
				
					
				
			This commit is contained in:
		
						commit
						90eba46893
					
				
							
								
								
									
										143
									
								
								DEPLOY_GCLOUD.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										143
									
								
								DEPLOY_GCLOUD.md
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,143 @@
 | 
			
		||||
# Deploying to Google Cloud
 | 
			
		||||
 | 
			
		||||
This guide provides instructions on how to build and deploy this application to Google Cloud using Docker, Cloud Build, and Cloud Run or Vertex AI Endpoints.
 | 
			
		||||
 | 
			
		||||
## Prerequisites
 | 
			
		||||
 | 
			
		||||
1.  **Google Cloud Project:** You have a Google Cloud Project with billing enabled.
 | 
			
		||||
2.  **Enable APIs:** Ensure the following APIs are enabled in your project:
 | 
			
		||||
    *   Cloud Build API
 | 
			
		||||
    *   Artifact Registry API
 | 
			
		||||
    *   Cloud Run API
 | 
			
		||||
    *   Vertex AI API (if deploying to Vertex AI Endpoints)
 | 
			
		||||
    *   Identity and Access Management (IAM) API
 | 
			
		||||
3.  **`gcloud` CLI:** You have the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install) installed and configured.
 | 
			
		||||
4.  **Permissions:** You have sufficient permissions to create Artifact Registry repositories, trigger Cloud Builds, and deploy to Cloud Run/Vertex AI. Roles like `Artifact Registry Administrator`, `Cloud Build Editor`, `Cloud Run Admin`, and `Vertex AI Administrator` are typically needed.
 | 
			
		||||
5.  **Docker configured for gcloud:** Configure Docker to authenticate with Artifact Registry:
 | 
			
		||||
    ```bash
 | 
			
		||||
    gcloud auth configure-docker ${_REGION}-docker.pkg.dev
 | 
			
		||||
    ```
 | 
			
		||||
    Replace `${_REGION}` with the region of your Artifact Registry (e.g., `us-central1`). You'll define this region in the Cloud Build substitutions.
 | 
			
		||||
 | 
			
		||||
## Step 1: Configure Cloud Build Substitutions
 | 
			
		||||
 | 
			
		||||
The `cloudbuild.yaml` file uses substitutions for your project ID, Artifact Registry region, repository name, and image name. You need to set these. The easiest way for a manual build is via the `gcloud` command.
 | 
			
		||||
 | 
			
		||||
**Important:** Before running the build, decide on:
 | 
			
		||||
*   `YOUR_GCP_PROJECT_ID`: Your actual Google Cloud Project ID.
 | 
			
		||||
*   `YOUR_ARTIFACT_REGISTRY_REGION`: The Google Cloud region for your Artifact Registry (e.g., `us-central1`).
 | 
			
		||||
*   `YOUR_ARTIFACT_REGISTRY_REPO`: The name for your Artifact Registry repository (e.g., `wan-video-repo`).
 | 
			
		||||
*   `YOUR_IMAGE_NAME`: The name for your Docker image (e.g., `wan-video-generator`).
 | 
			
		||||
 | 
			
		||||
The `cloudbuild.yaml` has default values for `_REGION`, `_REPOSITORY`, and `_IMAGE_NAME`. You can either:
 | 
			
		||||
*   Modify these defaults directly in `cloudbuild.yaml`.
 | 
			
		||||
*   Override them when submitting the build using the `--substitutions` flag (recommended for flexibility).
 | 
			
		||||
 | 
			
		||||
Example of overriding:
 | 
			
		||||
`--substitutions=PROJECT_ID="your-gcp-project-id",_REGION="us-central1",_REPOSITORY="wan-video-repo",_IMAGE_NAME="wan-video-generator"`
 | 
			
		||||
 | 
			
		||||
## Step 2: Create an Artifact Registry Repository
 | 
			
		||||
 | 
			
		||||
If you haven't already, create a Docker repository in Artifact Registry:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
gcloud artifacts repositories create YOUR_ARTIFACT_REGISTRY_REPO   --repository-format=docker   --location=YOUR_ARTIFACT_REGISTRY_REGION   --description="Docker repository for Wan video generator"
 | 
			
		||||
```
 | 
			
		||||
Ensure `YOUR_ARTIFACT_REGISTRY_REPO` and `YOUR_ARTIFACT_REGISTRY_REGION` match the values you'll use for the `_REPOSITORY` and `_REGION` substitutions in Cloud Build.
 | 
			
		||||
 | 
			
		||||
## Step 3: Build and Push the Docker Image
 | 
			
		||||
 | 
			
		||||
Submit the build to Google Cloud Build:
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
# Ensure you are in the root directory of the project where cloudbuild.yaml is located.
 | 
			
		||||
# Replace placeholders with your actual values.
 | 
			
		||||
gcloud builds submit --config cloudbuild.yaml   --substitutions=PROJECT_ID="YOUR_GCP_PROJECT_ID",_REGION="YOUR_ARTIFACT_REGISTRY_REGION",_REPOSITORY="YOUR_ARTIFACT_REGISTRY_REPO",_IMAGE_NAME="YOUR_IMAGE_NAME"   .
 | 
			
		||||
```
 | 
			
		||||
This command uses `cloudbuild.yaml` to build your Docker image and push it to your Artifact Registry.
 | 
			
		||||
 | 
			
		||||
## Step 4: Deploy the Container
 | 
			
		||||
 | 
			
		||||
You can deploy the container to Cloud Run or Vertex AI Endpoints.
 | 
			
		||||
 | 
			
		||||
### Option A: Deploy to Cloud Run
 | 
			
		||||
 | 
			
		||||
Cloud Run is suitable for web applications like Gradio interfaces.
 | 
			
		||||
 | 
			
		||||
1.  **Basic Deployment (CPU only):**
 | 
			
		||||
 | 
			
		||||
    ```bash
 | 
			
		||||
    # Replace placeholders with your values.
 | 
			
		||||
    # YOUR_CLOUD_RUN_REGION should be a region where Cloud Run is available (e.g., us-central1).
 | 
			
		||||
    gcloud run deploy YOUR_IMAGE_NAME       --image YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest       --platform managed       --region YOUR_CLOUD_RUN_REGION       --allow-unauthenticated       --port 7860       --set-env-vars GRADIO_APP_SCRIPT="gradio/t2v_14B_singleGPU.py"       --memory=4Gi       --cpu=2
 | 
			
		||||
      # Adjust memory, CPU, and other flags as needed.
 | 
			
		||||
      # For larger models, you will need significantly more memory and potentially more CPU.
 | 
			
		||||
    ```
 | 
			
		||||
    *   The `--port 7860` matches the `EXPOSE 7860` in the Dockerfile.
 | 
			
		||||
    *   Use `--set-env-vars` to specify which Gradio app to run via the `GRADIO_APP_SCRIPT` environment variable.
 | 
			
		||||
    *   The image path format is `REGION-docker.pkg.dev/PROJECT_ID/REPOSITORY/IMAGE_NAME:latest`.
 | 
			
		||||
 | 
			
		||||
2.  **Deployment with GPU:**
 | 
			
		||||
 | 
			
		||||
    Cloud Run supports GPUs (check availability and machine types in your region).
 | 
			
		||||
    ```bash
 | 
			
		||||
    # Replace placeholders. YOUR_CLOUD_RUN_REGION must support GPUs.
 | 
			
		||||
    gcloud beta run deploy YOUR_IMAGE_NAME-gpu       --image YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest       --platform managed       --region YOUR_CLOUD_RUN_REGION       --allow-unauthenticated       --port 7860       --set-env-vars GRADIO_APP_SCRIPT="gradio/t2v_14B_singleGPU.py"       --memory=16Gi       --cpu=4       --execution-environment gen2       --args=--machine-type=a2-highgpu-1g # Or other suitable GPU machine types.
 | 
			
		||||
      # Consult Cloud Run documentation for current GPU options.
 | 
			
		||||
    ```
 | 
			
		||||
 | 
			
		||||
3.  **Considerations for Cloud Run:**
 | 
			
		||||
    *   **Timeout:** Default request timeout is 5 minutes. Increase if needed (max 60 minutes for Gen2).
 | 
			
		||||
    *   **Concurrency:** Adjust based on instance capacity.
 | 
			
		||||
    *   **Model Loading & Storage:**
 | 
			
		||||
        *   Models are currently packaged in the Docker image. This increases image size.
 | 
			
		||||
        *   For very large models, consider downloading them at startup from Google Cloud Storage (GCS) into the container. This would require modifying the Docker `CMD` or `ENTRYPOINT`.
 | 
			
		||||
 | 
			
		||||
### Option B: Deploy to Vertex AI Endpoints
 | 
			
		||||
 | 
			
		||||
Vertex AI Endpoints are better for dedicated ML model serving and offer more powerful hardware options.
 | 
			
		||||
 | 
			
		||||
1.  **Create an Endpoint:**
 | 
			
		||||
    ```bash
 | 
			
		||||
    # Replace placeholders. YOUR_VERTEX_AI_REGION is e.g., us-central1.
 | 
			
		||||
    gcloud ai endpoints create       --project=YOUR_GCP_PROJECT_ID       --region=YOUR_VERTEX_AI_REGION       --display-name="wan-video-endpoint"
 | 
			
		||||
    ```
 | 
			
		||||
    Note the `ENDPOINT_ID` from the output.
 | 
			
		||||
 | 
			
		||||
2.  **Deploy the model (container) to the Endpoint:**
 | 
			
		||||
    ```bash
 | 
			
		||||
    # Replace placeholders.
 | 
			
		||||
    # ENDPOINT_ID is from the previous command.
 | 
			
		||||
    # MACHINE_TYPE can be n1-standard-4, or a GPU type like a2-highgpu-1g.
 | 
			
		||||
    gcloud ai models deploy wan-video-model       --project=YOUR_GCP_PROJECT_ID       --region=YOUR_VERTEX_AI_REGION       --endpoint=ENDPOINT_ID       --display-name="v1"       --container-image-uri="YOUR_ARTIFACT_REGISTRY_REGION-docker.pkg.dev/YOUR_GCP_PROJECT_ID/YOUR_ARTIFACT_REGISTRY_REPO/YOUR_IMAGE_NAME:latest"       --machine-type=MACHINE_TYPE       # --container-env-vars="GRADIO_APP_SCRIPT=gradio/t2v_14B_singleGPU.py" # If serving Gradio UI
 | 
			
		||||
      # --container-ports=7860 # If serving Gradio UI
 | 
			
		||||
      # For dedicated prediction (non-Gradio), you'd typically implement /predict and /health routes.
 | 
			
		||||
      # --container-predict-route="/predict"
 | 
			
		||||
      # --container-health-route="/health"
 | 
			
		||||
    ```
 | 
			
		||||
    *   Vertex AI is more commonly used for direct prediction endpoints. If serving a Gradio UI, ensure networking and port configurations are appropriate. Accessing a UI might require additional setup (e.g., IAP).
 | 
			
		||||
 | 
			
		||||
## Step 5: Accessing your Deployed Application
 | 
			
		||||
 | 
			
		||||
*   **Cloud Run:** The `gcloud run deploy` command will output a service URL.
 | 
			
		||||
*   **Vertex AI Endpoint:** Typically accessed programmatically via SDK or REST API.
 | 
			
		||||
 | 
			
		||||
## Step 6: Checking Logs for Troubleshooting
 | 
			
		||||
 | 
			
		||||
*   **Cloud Build Logs:** Google Cloud Console > Cloud Build > History.
 | 
			
		||||
*   **Cloud Run Logs:** Google Cloud Console > Cloud Run > Select Service > Logs tab.
 | 
			
		||||
*   **Vertex AI Endpoint Logs:** Google Cloud Console > Vertex AI > Endpoints > Select Endpoint > View Logs. Also available in Cloud Logging.
 | 
			
		||||
 | 
			
		||||
Look for errors related to dependency installation, model loading, resource limits (memory/CPU), or port configurations.
 | 
			
		||||
 | 
			
		||||
## Choosing Machine Types and Resources
 | 
			
		||||
 | 
			
		||||
*   **CPU/Memory:** The 14B models are very demanding. Start with high CPU/memory (e.g., 4+ vCPUs, 16GB+ RAM) and scale up.
 | 
			
		||||
*   **GPU:** Essential for 14B models and highly recommended for 1.3B models.
 | 
			
		||||
    *   **Cloud Run:** Gen2 execution environment with GPU machine types.
 | 
			
		||||
    *   **Vertex AI:** Offers a wide variety of GPUs (T4, V100, A100).
 | 
			
		||||
*   **Model Sizes & Compatibility:**
 | 
			
		||||
    *   The `nvcr.io/nvidia/pytorch:24.03-py3` base image uses CUDA 12.1. Ensure chosen GPUs are compatible (e.g., Ampere, Hopper).
 | 
			
		||||
    *   For 14B models, you'll likely need GPUs with large VRAM (e.g., A100 40GB or 80GB). Check the model's specific requirements.
 | 
			
		||||
 | 
			
		||||
This guide provides a starting point. You may need to adjust configurations based on the specific model you are deploying and your performance requirements.
 | 
			
		||||
							
								
								
									
										48
									
								
								Dockerfile
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										48
									
								
								Dockerfile
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,48 @@
 | 
			
		||||
# Use a PyTorch base image with CUDA support
 | 
			
		||||
# Check NVIDIA's NGC catalog or Docker Hub for suitable images
 | 
			
		||||
# Example: nvcr.io/nvidia/pytorch:24.03-py3 (PyTorch 2.4.0, Python 3.10, CUDA 12.1)
 | 
			
		||||
# Ensure the Python version in the base image is compatible with project dependencies.
 | 
			
		||||
# Python 3.10 is generally a safe bet for recent PyTorch versions.
 | 
			
		||||
FROM nvcr.io/nvidia/pytorch:24.03-py3
 | 
			
		||||
 | 
			
		||||
# Set the working directory
 | 
			
		||||
WORKDIR /app
 | 
			
		||||
 | 
			
		||||
# Install essential build tools and Python headers (if needed for some packages)
 | 
			
		||||
# RUN apt-get update && apt-get install -y --no-install-recommends #     build-essential #     python3-dev #  && rm -rf /var/lib/apt/lists/*
 | 
			
		||||
 | 
			
		||||
# Copy the requirements file first to leverage Docker layer caching
 | 
			
		||||
COPY requirements.txt .
 | 
			
		||||
 | 
			
		||||
# Install Python dependencies
 | 
			
		||||
# Upgrading pip, setuptools, and wheel first can prevent some build issues.
 | 
			
		||||
RUN pip install --upgrade pip setuptools wheel
 | 
			
		||||
 | 
			
		||||
# Special handling for flash-attn as per INSTALL.md
 | 
			
		||||
# Attempting the no-build-isolation method first.
 | 
			
		||||
# Ensure that the PyTorch and CUDA versions are compatible with flash-attn
 | 
			
		||||
RUN pip install flash-attn --no-build-isolation
 | 
			
		||||
 | 
			
		||||
# Install other dependencies
 | 
			
		||||
# Using --no-cache-dir to reduce image size
 | 
			
		||||
RUN pip install --no-cache-dir -r requirements.txt
 | 
			
		||||
 | 
			
		||||
# Copy the rest of the application code
 | 
			
		||||
COPY . .
 | 
			
		||||
 | 
			
		||||
# Make generate.py executable (if it's intended to be run directly and has a shebang)
 | 
			
		||||
# RUN chmod +x generate.py
 | 
			
		||||
 | 
			
		||||
# Set environment variables (optional, can be overridden at runtime)
 | 
			
		||||
# Example: Set a default Gradio app to run
 | 
			
		||||
ENV GRADIO_APP_SCRIPT gradio/t2v_14B_singleGPU.py
 | 
			
		||||
# ENV MODEL_CKPT_DIR ./Wan2.1-T2V-14B # Or a path inside the container where models will be mounted/downloaded
 | 
			
		||||
 | 
			
		||||
# Expose the default Gradio port (usually 7860)
 | 
			
		||||
EXPOSE 7860
 | 
			
		||||
 | 
			
		||||
# Default command to run the Gradio application
 | 
			
		||||
# This assumes the Gradio apps are launched with `python <script_name>`
 | 
			
		||||
# Users might need to adjust this based on how they want to run their app.
 | 
			
		||||
# Using `sh -c` to allow environment variable substitution in the command.
 | 
			
		||||
CMD ["sh", "-c", "python $GRADIO_APP_SCRIPT"]
 | 
			
		||||
							
								
								
									
										43
									
								
								cloudbuild.yaml
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										43
									
								
								cloudbuild.yaml
									
									
									
									
									
										Normal file
									
								
							@ -0,0 +1,43 @@
 | 
			
		||||
steps:
 | 
			
		||||
- name: 'gcr.io/cloud-builders/docker'
 | 
			
		||||
  args: [
 | 
			
		||||
    'build',
 | 
			
		||||
    '-t',
 | 
			
		||||
    '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}',
 | 
			
		||||
    '-t',
 | 
			
		||||
    '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest',
 | 
			
		||||
    '.'
 | 
			
		||||
  ]
 | 
			
		||||
  id: 'Build Docker image'
 | 
			
		||||
 | 
			
		||||
- name: 'gcr.io/cloud-builders/docker'
 | 
			
		||||
  args: ['push', '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}']
 | 
			
		||||
  id: 'Push image to Artifact Registry (tagged with commit SHA)'
 | 
			
		||||
 | 
			
		||||
- name: 'gcr.io/cloud-builders/docker'
 | 
			
		||||
  args: ['push', '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest']
 | 
			
		||||
  id: 'Push image to Artifact Registry (tagged as latest)'
 | 
			
		||||
 | 
			
		||||
# Substitutions (users should replace these with their actual values or set them during build trigger)
 | 
			
		||||
# _REGION: The region of your Artifact Registry repository (e.g., us-central1)
 | 
			
		||||
# _REPOSITORY: The name of your Artifact Registry repository (e.g., my-app-repo)
 | 
			
		||||
# _IMAGE_NAME: The name for your Docker image (e.g., wan-video-generator)
 | 
			
		||||
 | 
			
		||||
# Default values for substitutions (can be overridden)
 | 
			
		||||
substitutions:
 | 
			
		||||
  _REGION: 'us-central1' # Replace with your Artifact Registry region
 | 
			
		||||
  _REPOSITORY: 'my-app-images' # Replace with your Artifact Registry repository name
 | 
			
		||||
  _IMAGE_NAME: 'wan-video-service' # Replace with your desired image name
 | 
			
		||||
 | 
			
		||||
# Optional: specify image to be pushed to Artifact Registry
 | 
			
		||||
# This allows the image to be displayed in the build summary in Google Cloud Console
 | 
			
		||||
images:
 | 
			
		||||
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:${COMMIT_SHA}'
 | 
			
		||||
- '${_REGION}-docker.pkg.dev/${PROJECT_ID}/${_REPOSITORY}/${_IMAGE_NAME}:latest'
 | 
			
		||||
 | 
			
		||||
options:
 | 
			
		||||
  logging: CLOUD_LOGGING_ONLY
 | 
			
		||||
  # Optional: Specify machine type for faster builds (e.g., for GPU builds if needed by Dockerfile)
 | 
			
		||||
  # machineType: 'E2_HIGHCPU_8' # or 'N1_HIGHCPU_32'
 | 
			
		||||
  # For GPU builds during the Docker build process (if Dockerfile uses GPU):
 | 
			
		||||
  # machineType: 'N1_HIGHCPU_8_CLOUD_GPU_P100_1' # Example, check available types
 | 
			
		||||
		Loading…
	
		Reference in New Issue
	
	Block a user