AWS CNI Prefix not propagated to nodes
**_Issue as reported by user:_** - having issue with aws cni prefix support. - Created new ami based on service catalog 0.75.0 - updated eks-cluster terragrunt.hcl to 07.50 (terragrunt apply) (using MacOS and default python version ``` $ python -V Python 2.7.18 ``` - issued kubergrunt eks sync-core-components ``` $ kubergrunt -v kubergrunt version v0.8.0 ``` - I see that EKS worker supports 110 node - but it seems prefix delagation is not set true in aws-node. - created nginx deployment with 80 replicas and all are failed: ``` Warning FailedCreatePodSandBox 2m14s (x239 over 11m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "96bee0c11c960eab5b6de6d881d9e092ffef9cd07c7c4a3b02de256902e375b0" network for pod "nginx-deployment-6c8f99b66f-79rlj": networkPlugin cni failed to set up pod "nginx-deployment-6c8f99b66f-79rlj_default" network: add cmd: failed to assign an IP address to container $ kubectl describe pod -n kube-system aws-node Readiness: exec [/app/grpc-health-probe -addr=:50051] delay=1s timeout=1s period=10s #success=1 #failure=3 Environment: ADDITIONAL_ENI_TAGS: {} AWS_VPC_CNI_NODE_PORT_SUPPORT: true AWS_VPC_ENI_MTU: 9001 AWS_VPC_K8S_CNI_CONFIGURE_RPFILTER: false AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG: false AWS_VPC_K8S_CNI_EXTERNALSNAT: false AWS_VPC_K8S_CNI_LOGLEVEL: DEBUG AWS_VPC_K8S_CNI_LOG_FILE: /host/var/log/aws-routed-eni/ipamd.log AWS_VPC_K8S_CNI_RANDOMIZESNAT: prng AWS_VPC_K8S_CNI_VETHPREFIX: eni AWS_VPC_K8S_PLUGIN_LOG_FILE: /var/log/aws-routed-eni/plugin.log AWS_VPC_K8S_PLUGIN_LOG_LEVEL: DEBUG DISABLE_INTROSPECTION: false DISABLE_METRICS: false ENABLE_POD_ENI: false ENABLE_PREFIX_DELEGATION: false MY_NODE_NAME: (v1:spec.nodeName) WARM_ENI_TARGET: 1 WARM_PREFIX_TARGET: 1 Mounts: ``` ``` # module.eks_cluster.null_resource.customize_aws_vpc_cni[0] will be created + resource "null_resource" "customize_aws_vpc_cni" { + id = (known after apply) + triggers = { + "eks_cluster_endpoint" = "https://xyz.gr7.eu-west-1.eks.amazonaws.com/" + "enable_prefix_delegation" = "true" + "sync_core_components_action_id" = "1670507111626965214" } } ``` - It seems user-data updated but not aws-node deployment. ``` From: https://alestic.com/2010/12/ec2-user-data-output/ exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1 # Include common functions source /etc/user-data/user-data-common.sh function configure_eks_instance { local -r aws_region="$1" local -r eks_cluster_name="$2" local -r eks_endpoint="$3" local -r eks_certificate_authority="$4" local -r node_labels="$(map-ec2-tags-to-node-labels)" start_fail2ban local max_pods max_pods="$(/etc/eks/max-pods-calculator.sh --instance-type-from-imds --cni-version 1.9.0-eksbuild.1 --cni-prefix-delegation-enabled)" echo "Running eks bootstrap script to register instance to cluster" /etc/eks/bootstrap.sh \ --apiserver-endpoint "$eks_endpoint" \ --b64-cluster-ca "$eks_certificate_authority" \ --kubelet-extra-args "--node-labels=\"$node_labels\" --max-pods=$max_pods " \ --use-max-pods false \ "$eks_cluster_name" } ``` ``` $ kubectl describe node ip-10-4-90-172.eu-west-1.compute.internal Allocatable: attachable-volumes-aws-ebs: 25 cpu: 1930m ephemeral-storage: 37569620724 hugepages-1Gi: 0 hugepages-2Mi: 0 memory: 3412448Ki pods: 110 ```
The issue here is that you ran `kubergrunt eks sync-core-components` manually, instead of relying on the module. Due to the way the `aws-vpc-cni` is managed and how the prefix delegation is enabled, each run of `kubergrunt eks sync-core-components` ends up resetting the `aws-vpc-cni` daemonset to the default initial settings (where prefix delegation is not enabled). So when you manually ran the command, it reset the settings to disable it. In the module, we work around this by running `kubergrunt eks sync-core-components` first, and then updating the daemonset with the relevant environment variables. You can reset this back by doing the following: - First, run `terragrunt apply` or `terraform apply` with the variable `vpc_cni_enable_prefix_delegation = false`. This will reset the terraform state to ensure that it is in sync with the current state where prefix delegation is disabled. - Next, run `terragrunt apply` or `terraform apply` again with the variable `vpc_cni_enable_prefix_delegation = true`. This will trigger the terraform `local-exec` call to set the environment variables to prefix delegation mode turned on.