Who	What	Reference
DEV	Upstream roadmap issue (or individual upstream PRs)	<link to GitHub Issue>
DEV	Upstream documentation merged	<link to meaningful PR>
DEV	gap doc updated	<name sheet and cell>
DEV	Upgrade consideration	<link to upgrade-related test or design doc>
DEV	CEE/PX summary presentation	label epic with cee-training and add a <link to your support-facing preso>
QE	Test plans in Polarion	<link or reference to Polarion>
QE	Automated tests merged	<link or reference to automated tests>
DOC	Downstream documentation merged	<link to meaningful PR>

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Resource	Terraform API
VM Instance	google_compute_instance
Image	google_compute_image
Address	google_compute_address(beta)
ForwardingRule	google_compute_forwarding_rule(beta)
Zones	google_dns_managed_zone
Storage Bucket	google_storage_bucket

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes
OCI Bare Metal Shapes must be certified with RHEL	It must also work with RHCOS (see iSCSI boot notes) as OCI BM standard shapes require RHCOS iSCSI to boot (OCPSTRAT-1246) Certified shapes: https://catalog.redhat.com/cloud/detail/249287
Successfully passing the OpenShift Provider conformance testing – this should be fairly similar to the results from the OCI VM test results.	Oracle will do these tests.
Updating Oracle Terraform files
Making the Assisted Installer modifications needed to address the CCM changes and surface the necessary configurations.	Support Oracle Cloud in Assisted-Installer CI: ~~MGMT-14039~~

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

Requirement	Notes	isMvp?
A declarative mechanism to automate the catalog update process in file-based catalog (FBC) with newly-published bundle references.		Yes
A declarative mechanism to publish Operator releases in file-based catalog (FBC) to multiple OCP releases.		Yes
A declarative mechanism to convert file-based catalog (FBC) to sqlite database format so it can be publish to OCP versions without FBC supports.		Yes
A declarative mechanism to convert existing catalog from sqlite database to file-based catalog (FBC) basic template.		Yes
A declarative mechanism to convert existing catalog from sqlite database to file-based catalog (FBC) semver template when possible and/or highlights the uncompleted sections so users can easier identify the gaps.		NO
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	Yes
Release Technical Enablement	Provide necessary release enablement details and documents.	Yes

Requirement	Notes	isMvp?
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES
Provide a mechanism to tune the platform to use only one physical core.	Users need to be able to tune different platforms.	YES
Allow for full zero touch provisioning of a node with the minimal core budget configuration.	Node provisioned with SNO Far Edge provisioning method - i.e. ZTP via RHACM, using DU Profile.	YES
Platform meets all MVP KPIs		YES

Bug OCPBUGS-14757: images: RHEL-8 container image is missing `xz`

View the Description View the linked PRs

Description of problem:

RHEL-7 already comes with {{xz}} installed but in RHEL-8 it needs to explicitly installed.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Use an image based on Dockerfile.upi.ci.rhel8
2. Trigger a CI job that uses the xz tool
3.

Actual results:

/bin/sh: xz: command not found
tar: /tmp/secret/terraform_state.tar.xz: Wrote only 4096 of 10240 bytes
tar: Child returned status 127
tar: Error is not recoverable: exiting now

Expected results:

no errors

Additional info:

Step: https://github.com/openshift/release/blob/master/ci-operator/step-registry/upi/install/vsphere/upi-install-vsphere-commands.sh#L185

And investigation by Jinyun Ma: https://github.com/openshift/release/pull/39991#issuecomment-1581937323

https://github.com/openshift/installer/pull/7238

Bug OCPBUGS-23826: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/67

Bug OCPBUGS-11640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2400

Bug OCPBUGS-17940: IPI on Power VS cannot deploy image-registry operator in the disconnected scenario.

View the Description View the linked PRs

Description of problem:

When deploying a disconnected cluster with the installer, the image-registry operator will fail to deploy because it cannot reach the COS endpoint.

Version-Release number of selected component (if applicable):

How reproducible:

Easily

Steps to Reproduce:

1. Deploy a disconnected cluster with the installer
2. Watch the image-registry operator, it will  fail to deploy

Actual results:

image-registry operator doesn't deploy because the COS endpoint is unreachable.

Expected results:

image-registry operator should deploy

Additional info:

Fix identified.

https://github.com/openshift/installer/pull/7430

Bug OCPBUGS-22788: [release-4.14] Remove 90s readiness probe initial delay for OVN-IC

View the Description View the linked PRs

OVN-IC doesn't use RAFT and doesn't need to wait a while for the cluster to converge. So we don't need the 90s delay for the readiness probe on the NB and SB containers anymore.

I think we only want to do this for multi-zone-interconnect though since the other deployment types would still use some RAFT.

https://github.com/openshift/cluster-network-operator/pull/2090

Bug OCPBUGS-10478: [Azure] fail to collect the vm serial log with ‘gather bootstrap’

View the Description View the linked PRs

Description of problem:

Fail to collect the vm serial log with ‘openshift-install gather bootstrap’

Version-Release number of selected component (if applicable):

 4.13.0-0.nightly-2023-03-14-053612

How reproducible:

Always

Steps to Reproduce:

1.IPI install a private cluster, Once bootstrap node boot up, before it is terminated,
2. ssh to the bastion, then try to get bootstrap log 
$openshift-install gather bootstrap --key openshift-qe.pem --bootstrap 10.0.0.5 --master 10.0.0.7 –loglevel debug
3.

Actual results:

Fail to get the vm serial logs, in the output:
…
DEBUG Gather remote logs                           
DEBUG Collecting info from 10.0.0.6                
DEBUG scp: ./installer-masters-gather.sh: Permission denied 
 EBUG Warning: Permanently added '10.0.0.6' (ECDSA) to the list of known hosts.…DEBUG Waiting for logs ...                         
DEBUG Log bundle written to /var/home/core/log-bundle-20230317033401.tar.gz 
WARNING Unable to stat /var/home/core/serial-log-bundle-20230317033401.tar.gz, skipping 
INFO Bootstrap gather logs captured here "/var/home/core/log-bundle-20230317033401.tar.gz"

Expected results:

Get the vm serial log and in the log has not the above “WARNING  Unable to stat…”

Additional info:

IPI install on local install, has the same issue.
INFO Pulling VM console logs                     
DEBUG attemping to download                       
…                       
INFO Failed to gather VM console logs: unable to download file: /root/temp/4.13.0-0.nightly-2023-03-14-053612/ipi/serial-log-bundle-20230317042338

https://github.com/openshift/installer/pull/6992

Bug OCPBUGS-10701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/357

Bug OCPBUGS-10950: "pipelines-as-code-pipelinerun-go" configMap is not been used for the Go repository

View the Description View the linked PRs

Description of problem:

"pipelines-as-code-pipelinerun-go" configMap is not been used for the Go repository while creating Pipeline Repository. "pipelines-as-code-pipelinerun-generic" configMap has been used.

Prerequisites (if any, like setup, operators/versions):

Install Red Hat Pipeline operator

Steps to Reproduce

Navigate to Create Repository form
Enter the Git URL `https://github.com/vikram-raj/hello-func-go`
Click on Add

Actual results:

`pipelines-as-code-pipelinerun-generic` PipelineRun template has been shown on the overview page

Expected results:

`pipelines-as-code-pipelinerun-go` PipelineRun template should show on the overview page

Reproducibility (Always/Intermittent/Only Once):

Build Details:

4.13

Workaround:

Additional info:

https://github.com/openshift/console/pull/12682

Bug OCPBUGS-19952: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1728

Bug OCPBUGS-20980: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-proxy/pull/267

Bug OCPBUGS-6372: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/175

Story HOSTEDCP-965: Remove friction for consumers to run dump cluster command

View the Description View the linked PRs

Add a page to our documentation to describe what information needs to be gathered in the case of a failure/bug.

Document how to use the `hypershift dump cluster` command.

Support impersonate flag to make it easier to run against prod envs.

https://github.com/openshift/hypershift/pull/2653

Bug OCPBUGS-12525: node role is calculated twice in thanos-querier API

View the Description View the linked PRs

Description of problem:

tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

launch 4.14-ci,openshift/cluster-monitoring-operator#1926 no-spot

3 masters, 3 workers, each node is with 4 cpus, no infra node

$ oc get node
NAME                                         STATUS   ROLES                  AGE   VERSION
ip-10-0-132-193.us-east-2.compute.internal   Ready    control-plane,master   23m   v1.26.2+d2e245f
ip-10-0-135-65.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f
ip-10-0-149-72.us-east-2.compute.internal    Ready    worker                 14m   v1.26.2+d2e245f
ip-10-0-158-0.us-east-2.compute.internal     Ready    worker                 14m   v1.26.2+d2e245f
ip-10-0-229-135.us-east-2.compute.internal   Ready    worker                 17m   v1.26.2+d2e245f
ip-10-0-234-36.us-east-2.compute.internal    Ready    control-plane,master   23m   v1.26.2+d2e245f

labels see below

control-plane: node-role.kubernetes.io/control-plane: ""
master: node-role.kubernetes.io/master: ""
worker: node-role.kubernetes.io/worker: ""

search with "cluster:capacity_cpu_cores:sum" on admin console "Observe -> Metrics", label_node_role_kubernetes_io=master and label_node_role_kubernetes_io="" are both calculated twice

Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus            Value
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum  m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12

checked from thanos-querier API, same result with that from console UI(console UI used thanos-querier API)

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=cluster:capacity_cpu_cores:sum' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "label_node_role_kubernetes_io": "master",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      },
      {
        "metric": {
          "__name__": "cluster:capacity_cpu_cores:sum",
          "label_beta_kubernetes_io_instance_type": "m6a.xlarge",
          "label_kubernetes_io_arch": "amd64",
          "label_node_openshift_io_os_id": "rhcos",
          "label_node_role_kubernetes_io": "master",
          "prometheus": "openshift-monitoring/k8s"
        },
        "value": [
          1682394655.248,
          "12"
        ]
      }
    ]
  }
}

no such issue if we query the expr for "cluster:capacity_cpu_cores:sum" directly

Name                label_beta_kubernetes_io_instance_type    label_kubernetes_io_arch    label_node_openshift_io_os_id    label_node_role_kubernetes_io    prometheus             Value
cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                                openshift-monitoring/k8s    12
cluster:capacity_cpu_cores:sum    m6a.xlarge                amd64                rhcos                master                openshift-monitoring/k8s    12

should do deduplication for thanos-querier API

Version-Release number of selected component (if applicable):

tested https://issues.redhat.com/browse/OCPBUGS-10387 with PR

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

node role is calculated twice in thanos-querier API

Expected results:

node role should be calculated only once in thanos-querier API

https://github.com/openshift/thanos/pull/112

Bug OCPBUGS-13975: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/546

Bug OCPBUGS-19526: Unnecessary API calls if TektonConfig is not minimal

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13152~~. The following is the description of the original issue:
—
Description of problem:
With ~~OCPBUGS-11099~~ our Pipeline Plugin supports the TektonConfig config "embedded-status: minimal" option that will be the default in OpenShift Pipelines 1.11+.

But since this change, the Pipeline pages loads the TaskRuns for any Pipeline and PipelineRun rows. To decrease the risk of a performance issue we should make this call only if the status.tasks wasn't defined.

Version-Release number of selected component (if applicable):

4.12-4.14, as soon as ~~OCPBUGS-11099~~ is backported.
Tested with Pipelines operator 1.10.1

How reproducible:
Always

Steps to Reproduce:

Install Pipelines operator
Import a Git repository and enable the Pipeline option
Open the browser network inspector
Navigate to the Pipeline page

Actual results:
The list page load a list of TaskRuns for each Pipeline / PipelineRun also if the PipelineRun contains the related data already (status.tasks)

Expected results:
No unnecessary network calls. When the admin changes the TektonConfig config "embedded-status" option to minimal the UI should still work and load the TaskRuns as it does it today.

Additional info:
None

https://github.com/openshift/console/pull/13178

Bug OCPBUGS-23393: [4.14] IPv6 BMC cannot reach image on provisioning network

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23131~~. The following is the description of the original issue:
—
The final iteration (of 3) of the fix for ~~OCPBUGS-4248~~ - https://github.com/openshift/cluster-baremetal-operator/pull/341 - uses the (IPv6) API VIP as the IP address for IPv6 BMCs to contact Apache to download the image to mount via virtualmedia.

When the provisioning network is active, this should use the (IPv6) Provisioning VIP unless the virtualMediaViaExternalNetwork flag is true.

https://github.com/openshift/cluster-baremetal-operator/pull/384

Bug OCPBUGS-14023: Log vcenter version in raw string format in problem-detector

View the Description View the linked PRs

We should log vcenter version information in plain text.

There are cases in code where vcenter version that we receive from vcenter could become unparseable. I see errors in problem-detector while parsing the version and both CSI driver and operator depends on ability to determine vcenter version.

https://github.com/openshift/vsphere-problem-detector/pull/114

Bug OCPBUGS-14848: Validation fails due the chose non block disk

View the Description View the linked PRs

Description of problem:

when using agent based installer to provision OCP, the Validation failed with the following message:
"id": "sufficient-installation-disk-speed"
"status": "failure"
"message": "While preparing the previous installation the installation disk speed measurement failed or was found to be insufficient"

Version-Release number of selected component (if applicable):

4.13.0
{

  "versions": {

    "assisted-installer": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3a8b33263729ab42c0ff29b9d5e8b767b7b1a9b31240c592fa8d173463fb04d1",

    "assisted-installer-controller": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ce3e2e4aac617077ac98b82d9849659595d85cd31f17b3213da37bc5802b78e1",

    "assisted-installer-service": "Unknown",

    "discovery-agent": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70397ac41dffaa5f3333c00ac0c431eff7debad9177457a038b6e8c77dc4501a"

  }

}

How reproducible:

100%

Steps to Reproduce:

1. Using agent based installer provision the DELL 16G server
2. 
3.

Actual results:

Validation failed with "sufficient-installation-disk-speed"

Expected results:

Validation pass

Additional info:

[root@c2-esx02 bin]# lsblkNAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTSloop0         7:0    0 125.7G  0 loop /var/lib/containers/storage/overlay                                      /var                                      /etc                                      /run/ephemeralloop1         7:1    0   934M  0 loop /usr                                      /boot                                      /                                      /sysrootnvme1n1     259:0    0   1.5T  0 disknvme0n1     259:2    0 894.2G  0 disk├─nvme0n1p1 259:6    0     2M  0 part├─nvme0n1p2 259:7    0    20M  0 part├─nvme0n1p3 259:8    0  93.1G  0 part├─nvme0n1p4 259:9    0 701.9G  0 part└─nvme0n1p5 259:10   0  99.2G  0 partnvme2n1     259:3    0   1.5T  0 disknvme4n1     259:4    0   1.5T  0 disknvme3n1     259:5    0   1.5T  0 disk[root@c2-esx02 bin]# ls -lh /dev |grep nvmecrw-------.   1 root root    239,     0 Jun 12 06:01 nvme0-rw-r--r--.   1 root root          4.0M Jun 12 06:04 nvme0c0n1brw-rw----.   1 root disk    259,     2 Jun 12 06:01 nvme0n1brw-rw----.   1 root disk    259,     6 Jun 12 06:01 nvme0n1p1brw-rw----.   1 root disk    259,     7 Jun 12 06:01 nvme0n1p2brw-rw----.   1 root disk    259,     8 Jun 12 06:01 nvme0n1p3brw-rw----.   1 root disk    259,     9 Jun 12 06:01 nvme0n1p4brw-rw----.   1 root disk    259,    10 Jun 12 06:01 nvme0n1p5crw-------.   1 root root    239,     1 Jun 12 06:01 nvme1brw-rw----.   1 root disk    259,     0 Jun 12 06:01 nvme1n1crw-------.   1 root root    239,     2 Jun 12 06:01 nvme2brw-rw----.   1 root disk    259,     3 Jun 12 06:01 nvme2n1crw-------.   1 root root    239,     3 Jun 12 06:01 nvme3brw-rw----.   1 root disk    259,     5 Jun 12 06:01 nvme3n1crw-------.   1 root root    239,     4 Jun 12 06:01 nvme4brw-rw----.   1 root disk    259,     4 Jun 12 06:01 nvme4n1[root@c2-esx02 bin]# lsblk -f nvme0c0n1lsblk: nvme0c0n1: not a block device[root@c2-esx02 bin]# ls -l /dev/disk/by-id/total 0lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-CN0WW56VFCP0033900HU -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 google-CN0WW56VFCP0033900HU-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB112600291P9SGN -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB115400P81P9SGN -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB120401CP1P9SGN -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 google-PHAB124501MF1P9SGN -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-Dell_BOSS-N1_CN0WW56VFCP0033900HU-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB112600291P9SGN -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB115400P81P9SGN -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB120401CP1P9SGN -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-Dell_Ent_NVMe_P5600_MU_U.2_1.6TB_PHAB124501MF1P9SGN -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.0050434209000001 -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 nvme-eui.0050434209000001-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e44e7a445351 -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e48f14515351 -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e49d3e605351 -> ../../nvme4n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 nvme-eui.01000000000000005cd2e4fd973e5351 -> ../../nvme3n1[root@c2-esx02 bin]# ls -l /dev/disk/by-pathtotal 0lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:01:00.0-nvme-1 -> ../../nvme0n1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part1 -> ../../nvme0n1p1lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part2 -> ../../nvme0n1p2lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part3 -> ../../nvme0n1p3lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part4 -> ../../nvme0n1p4lrwxrwxrwx. 1 root root 15 Jun 12 06:01 pci-0000:01:00.0-nvme-1-part5 -> ../../nvme0n1p5lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c3:00.0-nvme-1 -> ../../nvme1n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c4:00.0-nvme-1 -> ../../nvme2n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c5:00.0-nvme-1 -> ../../nvme3n1lrwxrwxrwx. 1 root root 13 Jun 12 06:01 pci-0000:c6:00.0-nvme-1 -> ../../nvme4n1

https://github.com/openshift/assisted-installer-agent/pull/558

Bug OCPBUGS-15308: CPMSO: fix linting issue comment in test

View the Description View the linked PRs

Description of problem:

A leftover comment in CPMSO tests is causing a linting issue.

Version-Release number of selected component (if applicable):

4.13.z, 4.14.0

How reproducible:

Always

Steps to Reproduce:

1. make lint
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/221

Bug OCPBUGS-15776: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1497

Bug OCPBUGS-23145: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28385

Bug OCPBUGS-8004: oc image mirror includes wrong digest when rewriting lists

View the Description View the linked PRs

Description of problem:

When using oc image mirror, oc creates a new manifest lists when filtering platforms. When this happens, oc still tries to push and tag the original manifest list.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Consistent

Steps to Reproduce:

1. Run oc image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c yourregistry.io/busybox:target
2. Check the plan, see that the original manifest digest is being used for the tag

Actual results:

jammy:Downloads$ oc image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c sparse-registry1.fyre.ibm.com/jammy/busybox:target
sparse-registry1.fyre.ibm.com/
  jammy/busybox
    blobs:
      docker.io/library/busybox sha256:1d57ab16f681953c15d7485bf3ee79a49c2838e5f9394c43e20e9accbb1a2b20 1.436KiB
      docker.io/library/busybox sha256:99ee43e96ff50e90c5753954d7ce2dfdbd7eb9711c1cd96de56d429cb628e343 1.436KiB
      docker.io/library/busybox sha256:a22ab831b2b2565a624635af04e5f76b4554d9c84727bf7e6bc83306b3b339a9 1.436KiB
      docker.io/library/busybox sha256:abaa813f94fdeebd3b8e6aeea861ab474a5c4724d16f1158755ff1e3a4fde8b0 1.438KiB
      docker.io/library/busybox sha256:b203a35cab50f0416dfdb1b2260f83761cb82197544b9b7a2111eaa9c755dbe7 937.1KiB
      docker.io/library/busybox sha256:46758452d3eef8cacb188405495d52d265f0c3a7580dfec51cb627c04c7bafc4 1.604MiB
      docker.io/library/busybox sha256:4c45e4bb3be9dbdfb27c09ac23c050b9e6eb4c16868287c8c31d34814008df80 1.847MiB
      docker.io/library/busybox sha256:f78e6840ded1aafb6c9f265f52c2fc7c0a990813ccf96702df84a7dcdbe48bea 1.908MiB
    manifests:
      sha256:4ff685e2bcafdab0d2a9b15cbfd9d28f5dfe69af97e3bb1987ed483b0abf5a99
      sha256:5e42fbc46b177f10319e8937dd39702e7891ce6d8a42d60c1b4f433f94200bd2
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc
      sha256:77ed5ebc3d9d48581e8afcb75b4974978321bd74f018613483570fcd61a15de8
      sha256:dde8e930c7b6a490f728e66292bc9bce42efc9bbb5278bae40e4f30f6e00fe8c
      sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c -> target

Expected results:

jammy:~$ oc-devel image mirror --filter-by-os 'linux/arm' docker.io/library/busybox@sha256:7b3ccabffc97de872a30dfd234fd972a66d247c8cfc69b0550f276481852627c sparse-registry1.fyre.ibm.com/jammy/busybox:target
sparse-registry1.fyre.ibm.com/
  jammy/busybox
    blobs:
      docker.io/library/busybox sha256:1d57ab16f681953c15d7485bf3ee79a49c2838e5f9394c43e20e9accbb1a2b20 1.436KiB
      docker.io/library/busybox sha256:99ee43e96ff50e90c5753954d7ce2dfdbd7eb9711c1cd96de56d429cb628e343 1.436KiB
      docker.io/library/busybox sha256:a22ab831b2b2565a624635af04e5f76b4554d9c84727bf7e6bc83306b3b339a9 1.436KiB
      docker.io/library/busybox sha256:abaa813f94fdeebd3b8e6aeea861ab474a5c4724d16f1158755ff1e3a4fde8b0 1.438KiB
      docker.io/library/busybox sha256:b203a35cab50f0416dfdb1b2260f83761cb82197544b9b7a2111eaa9c755dbe7 937.1KiB
      docker.io/library/busybox sha256:46758452d3eef8cacb188405495d52d265f0c3a7580dfec51cb627c04c7bafc4 1.604MiB
      docker.io/library/busybox sha256:4c45e4bb3be9dbdfb27c09ac23c050b9e6eb4c16868287c8c31d34814008df80 1.847MiB
      docker.io/library/busybox sha256:f78e6840ded1aafb6c9f265f52c2fc7c0a990813ccf96702df84a7dcdbe48bea 1.908MiB
    manifests:
      sha256:4ff685e2bcafdab0d2a9b15cbfd9d28f5dfe69af97e3bb1987ed483b0abf5a99
      sha256:5e42fbc46b177f10319e8937dd39702e7891ce6d8a42d60c1b4f433f94200bd2
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc
      sha256:77ed5ebc3d9d48581e8afcb75b4974978321bd74f018613483570fcd61a15de8
      sha256:dde8e930c7b6a490f728e66292bc9bce42efc9bbb5278bae40e4f30f6e00fe8c
      sha256:7128d7c7704fb628f1cedf161c01d929d3d831f2a012780b8191dae49f79a5fc -> target

Additional info:

https://github.com/openshift/oc/pull/1335

Bug OCPBUGS-20401: Master MCP is degraded because of MC not found

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-12707~~. The following is the description of the original issue:
—
Description of problem:


When we deploy a cluster in AWS using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci master MCP is degraded and reports this error:

  - lastTransitionTime: "2023-04-25T07:48:45Z"
    message: 'Node ip-10-0-55-111.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found", Node ip-10-0-60-138.us-east-2.compute.internal
      is reporting: "machineconfig.machineconfiguration.openshift.io \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\"
      not found", Node ip-10-0-69-137.us-east-2.compute.internal is reporting: "machineconfig.machineconfiguration.openshift.io
      \"rendered-master-8ef3f9cb45adb7bbe5f819eb831ffd7d\" not found"'
    reason: 3 nodes are reporting degraded status on sync
    status: "True"
    type: NodeDegraded

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

How reproducible:

2 out of 2.

Steps to Reproduce:

1. Install OCP using this template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci

We can see examples of this installation here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/198964/

and here:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/199028/


Builds have been marked as keep forever, but just in case, the parameters are:

INSTANCE_NAME_PREFIX: Your ID, any short string just make it sure it is unit.
VARIABLES_LOCATION: private-templates/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-sts-private-s3-custom_endpoints-ci
LAUNCHER_VARS: <leave empty>
BUSHSLICER_CONFIG:<leave emtpy>

Actual results:


The installation failed reporting a degrade master MCP

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       False         3h12m   Error while reconciling 4.14.0-0.nightly-2023-04-19-125337: the cluster operator machine-config is degraded

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      4h21m
worker   rendered-worker-166729d2617b1b63cf5d9bb818dd9cf8   True      False      False      3              3                   3                     0                      4h21m

Expected results:

Installation should finish without problems and no MCP should be degraded

Additional info:

Must gather linked in the first comment

https://github.com/openshift/installer/pull/7578

Bug OCPBUGS-12990: Custom `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools`

View the Description View the linked PRs

Description of problem:

After customizing the routes for Console and Downloads, the `Downloads` route is not being updated within the `https://custom-console-route/command-line-tools` and still pointing the old/default downloads route.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Customize Console and Downloads routes.
2. Access the web-console using custom console route.
3. Go to Command-line-tools.
4. Try to access the downloads urls.

Actual results:

While accessing the downloads urls, it is pointing towards default/old downloads route

Expected results:

While accessing the downloads urls, it should be pointing towards custom downloads route

Additional info:

https://github.com/openshift/console-operator/pull/761

Bug OCPBUGS-13314: [vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails

View the Description View the linked PRs

Description of problem:

[vmware csi driver] vsphere-syncher does not retry populate the CSINodeTopology with topology information when registration fails

When syncer starts it watches for node events, but it does not retry if registration fails and in the meanwhile any csinodetopoligy requests might not get served, because VM is not found

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Randomly

Steps to Reproduce:

1. Install OCP cluster by UPI with encrypt 
2. Check the cluster storage operator not degrade

Actual results:

cluster storage operator degrade that VSphereCSIDriverOperatorCRProgressing: VMwareVSphereDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods 

...
2023-05-09T06:06:22.146861934Z I0509 06:06:22.146850       1 main.go:183] ServeMux listening at "0.0.0.0:10300"
2023-05-09T06:07:00.283007138Z E0509 06:07:00.282912       1 main.go:64] failed to establish connection to CSI driver: context canceled
2023-05-09T06:07:07.283109412Z W0509 06:07:07.283061       1 connection.go:173] Still connecting to unix:///csi/csi.sock
...

# Many error logs in csi driver related timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance .

...
2023-05-09T06:19:16.499856730Z {"level":"error","time":"2023-05-09T06:19:16.499687071Z","caller":"k8sorchestrator/topology.go:837","msg":"timed out while waiting for topology labels to be updated in \"compute-2\" CSINodeTopology instance.","TraceId":"b8d9305e-9681-4eba-a8ac-330383227a23","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:837\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/node.go:429\ngithub.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:6231\ngoogle.golang.org/grpc.(*Server).processUnaryRPC\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1283\ngoogle.golang.org/grpc.(*Server).handleStream\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:1620\ngoogle.golang.org/grpc.(*Server).serveStreams.func1.2\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/vendor/google.golang.org/grpc/server.go:922"}
...

Expected results:

Install vsphere ocp cluster succeed and the cluster storage operator is healthy

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/78

Bug OCPBUGS-20396: [4.14] Unable to disable external CCM for platform external

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18455~~. The following is the description of the original issue:
—
Description of problem:

Some 3rd party clouds do not require the use of an external CCM. The installer enables an external CCM by default whenever the platform is external.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The external CCM can not be disabled when the platform type is external.

Expected results:

The external CCM should be able to be disabled when the platform type is external.

Additional info:

https://github.com/openshift/installer/pull/7594

Bug OCPBUGS-23423: Console: Cannot Edit Shipwright Build

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23164~~. The following is the description of the original issue:
—
Description of problem:


Unable to edit Shipwright Builds with the upcoming builds for Red Hat OpenShift release (based on Shipwright v0.12.0) in the developer and admin consoles.

Workaround is to use `oc edit build.shipwright.io ...`

Version-Release number of selected component (if applicable):


OCP 4.14
builds for OpenShift v1.0.0

How reproducible:


Always

Steps to Reproduce:


1. Deploy the builds for Red Hat OpenShift release candidate operator
2. Create a Build using the shp command line: `shp build create ...`
3. Open the Dev or Admin console for Shipwright Builds
4. Attempt to edit the Build object

Actual results:


Page appears to "freeze", does not let you edit.

Expected results:


Shipwright Build objects can be edited.

Additional info:


Can be reproduced by deploying the following "test catalog" - quay.io/adambkaplan/shipwright-io/operator-catalog:v0.13.0-rc7, then creating a subscription for the Shipwright operator.

Will likely be easier to reproduce once we have the downstream operator in the Red Hat OperatorHub catalog.

https://github.com/openshift/console/pull/13343

Bug OCPBUGS-9214: Create button is disabled in Git Import form when git repo url has hyphens in owner part of the url

View the Description View the linked PRs

Description of problem:

When adding a repository url that contains hyphens in the <owner> part of the url
(<https://github.com/owner/url> - eg https://github.com/redhat-developer/s2i-dotnetcore-ex.git), then create button stays disabled and validation errors are not presented in the UI.

Version-Release number of selected component (if applicable):
4.9

How reproducible:
Always

Steps to Reproduce:
1. Go to Developer -> Add -> Import from Git page
2. use the repo url https://github.com/redhat-developer/s2i-dotnetcore-ex.git
3. add `/app` in the context dir under advanced git options.

Actual results:

1Once the builder image is detected, then Create button is disabled but no errors in the form. When the user touches the name field and then name validation error message is shown even if the suggested name is valid.

Expected results:

After detecting the builder image, the create button should be enabled.

Additional info:

https://github.com/openshift/console/pull/12652

Bug ACM-4127: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5027

Bug OCPBUGS-14395: DNS Operator log.SetLogger was never called error message

View the Description View the linked PRs

Description of problem:

cluster-dns-operator startup has an error message:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Start cluster-dns-operator
2. oc edit dnses.operator.openshift.io default
  -> Change operatorLogLevel to "Trace" or "Debug" (it doesn't matter which, we just want to trigger an update)
3. Observe backtrace in logs

Actual results:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 201 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithValues(0xc0000bae40, {0xc000768ae0, 0x6, 0x6})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:168 +0x54
github.com/go-logr/logr.Logger.WithValues(...)
	/dns-operator/vendor/github.com/go-logr/logr/logr.go:323
sigs.k8s.io/controller-runtime/pkg/controller.NewUnmanaged.func1(0xc000991980)
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/controller/controller.go:121 +0x1f6
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0003265a0, {0x1bddf28, 0xc00049d7c0}, {0x17b6120?, 0xc000991960?})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:305 +0x18b
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0003265a0, {0x1bddf28, 0xc00049d7c0})
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265 +0x1d9
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226 +0x85
created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2
	/dns-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:222 +0x587

Expected results:

No error message

Additional info:

This is due to 1.27 rebase: https://github.com/openshift/cluster-dns-operator/pull/368

https://github.com/openshift/cluster-dns-operator/pull/369

Bug OCPBUGS-16554: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-apiserver-operator/pull/546

Bug OCPBUGS-21203: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/40

Bug OCPBUGS-7692: Helm detail page never stops loading when helm API fails to load

View the Description View the linked PRs

Description of problem:
Installed and uninstalled some helm charts, and got an issue that the helm details page couldn't be loaded successfully. This issue exists also in old versions and is aligned with ~~OCPBUGS-7517~~.

If the backend fails to load the frontend never stops loading the helm details page.

Version-Release number of selected component (if applicable):
Details page never stops loading

How reproducible:
Always with the Helm chart secret below.

Steps to Reproduce:
Unable to reproduce this manually again.

But you can apply the Secret at the end to any namespace.

You can create this in any namespace, but because it contains a namespace info "christoph" the helm list page links to an non existing URL. You can fix that manually or use the namespace "christoph".

Actual results:

Helm release detail page never finishes loading

Expected results:

Helm release detail page should load fine

Additional info:

Secret to reproduce this issue:

kind: Secret
apiVersion: v1
metadata: 
  name: sh.helm.release.v1.dotnet.v1
  labels: 
    name: dotnet
    owner: helm
    status: deployed
    version: '1'
data: 
  release: >-
    SDRzSUFBQUFBQUFDLytTOWEzT2JUTkl3L0ZmMDZ2NzRPZ2tnS3h1NWFqOFlZaUVVaVVUSTRyVFoybUlHREVqRDRSRWdHZTJULy83VXpBQUNoR3pMY1pMcjNyMnFyb3JGWWVqcGMvZDB6L3k3SDFxQjA3L3AyMUVhT21uL3F1K0hEMUgvNXQvOUIzK2JwUCt5blJoRnVXUDNiL29jd3czZU1kdzc5dnFlRzl4Y2oyNVk3djNINFhBMFpKa2g5Lzh6N0EzRDlLLzZ5SHJOVzdhRG5KUThUMzRrY092SHFSK0YvWnUrRkNhcGhWQVBSa0dNSCtwZjlaUFVTck1FQTExKzU2b2ZScW1ETDMwUGpTamI5dDdMZC9jOUs0NTdmdElEbVk5c1AzVC92OTU5MU52NXpyNlhlZzY5MmtPUm0xejF0bGw0OHozOEhrYVFYT2dCK0lIaW8vZnUzVU9FVUxUSGQrVW9kWHFwWjZXOUhIL2lNL2w0NElScGIrOGoxTnM2Y2JSTmU5LzdkOXV0RkZpdTh5MUQ2SHUvWjRWMjczdS91c0piY1BQMTRlRjd2NWVGcVk5cXNQaEpOY24zdmE4aGRMcnZYZEhQKzNoQSttWGc5S3dzalFJcjlhR0ZVTjdiUmdnNWRpL0swdmY5SDFkOTZGbmJGTk0wY0ZMTHRsSUwvOTJtKzg3WkpoVGp6SHZtUFh0Q2g5dmV4RUZCajR6VlM2TUNManc1U29VSzVjaUhGbjRuNlYvMU4wNitqN1oyMHIvNVIzK0w1eHM0K0hMeDBYOWU5YTNZVjZzUDc3aitWZDhLd3lndEJyajVONFg5WDlrVzlXLzZYcHJHeWMySEQ2NmZlaGw0RDZQZ1F4UTdZZUw1RCtrN3owSEJPL0owOHFINForc2d4MHFjNUlNZDdVTVVXZmFIcldON1Z2cU9mdjhkbVdqWHRmZXBlK2ovK0hIVlJ4SGM5Ry9DREtHcmZ1b0VNYklJbC8yalFsOTE4WVA4OWY1dStUNTl4TGlrT080N2d5U1ZSQlJJd25CcGFvL0kwR1UwMjZERFVoc2ViSEdjQUlFZlBTeGlFd3pVWEJLR1h4VjE0Um82djVkRWRKREVLV3Rwanh0TEc0YlNrbCtCbk9jc1RSMUlFeVV5bDd4dmF5Z3hCVDRCbkgyWUNYeHVhOWNmQlRmZUdUbTlKb25UOVd5US9rMFNVV1p3ajZ3cHJsd3BVSGEyT0VTMk1Nd01qVVdTZjV0SkUzWWtDVXhxQnFNRWlLT0I0TVpmd1VCQitEdUd2bkFkYmNSQ243OHpkVDRCQTVTYTJwQ1JKbllNeEwwTEEzVVBCbE5HRXFaakdFNm5RQnVIcHNxelFOejdrampPVE9IV1gycXNaM0xxd3RZZWswVXdYbHZNS0RCOXliVzFJV05wZTljV1BWVE9sYzViM2dHZFQweGRRVE9mL3dZQ0diWG1ITU9jWHdPTzNRTlJaY3psdm9ReEp0OWY4Z05MZTB3a2NZb2tjY3phNGlnMWRDVTJ1SEVDSmhzWGtubXFHMGtjc2Jady9ZWFNTTTFNUW91Ly83My80NnFEdVAveUhCUTcyK1I5R3FNMmZSVmtCaWd6bDdlK0tZNFlFS2pNTEJoNlFGdjVrc0JpK3Y3TnlmbU5xWm1lclQweVRWNGd6N2t6WkhwZ29pS1lEME1nam54RDIyY2dHS2ZtYXNTWitqUzNORXdQZGlTRTZkOW1TeDZCWU9IT2RIWWt1SGhzeGpWRk5iQzBJWktFNlFZTWxNelVGeGtReDc2cFBSNGsvelo5MEprdmxxZ21ZRGs4V01Kb2JZbmozUDRjdWM0Z2NXY2JPVEwwS1ZQQ2dwL0YxeTF0dUFZVGRkT1lWeWdqSUtwcld4emw5OGZ4c3hwc3NlbmZaZ3ZPODJDNHlCWTZ2MWNETlljYzJnR2Y4TG9ISjdlWk5WQjlVNTltbU1Zd0g4WWdKL004V05vbysrcnpmM1B5czJOOGtpWmpsdkpuRXg4WWJpdzdzeUJsalVETk1ieW1QczhzN2RNT2FPUE0wR3hrQ3F6djNCZnpSbE04Rnc5eXEyekZxYmtkb0xXNVBNSUlCandDb1J4Wm1zbk1BclNiRGFZc0NKVVlhS3VQa2xqSTBXMkJmMjI0SWZBOFFRL0lxWW1weVF3WVRPZUdOa1ZnTWkvNTR4eE9pSXhTZlBBeENPVEV4bnhRcHpIbWthWGt6cDdHYlF4Q21URzA0ZHJzbVBzOUdZTXYrTFNZQzRFcituS2V2MUdLOFhsZmZsK250S0RQamxrd1diaGZudEU3WDVhM21ScU1FMXRURCtWNGhkeGdXbjZrY0ZaeVVjajJrREUwV1BKb0liZUV2OC9JTFRGU01Bb2ZmUGQ5YmdXb1V6ZHJodmJJWWw0eFFqVUc0aUl6dGFGbkJJK0k2b1RZZ3lMU2F2eFo2S0hoRG9wcUJqa3ZOc01GNFRON2ZmZkY0bEJtZm83Y0JSbEwrUXk0WVdCcDhBdlFWTWJRRk04Vzd6NEsvcTFMYUZmUW8xUFdLaDF5VGVZckNYeFM4QTE1WHhKNFFxL3VkeDg5STFBVG1CUGUrQ1NKd3hnRUNnTGh3cFhwbkE1UVZOZGYzY2dsZW5EQ3MvYm42SXNrM0xyU1JObVI2d0w1eHRiU2hwdFNKby8wb3ZwNkZoVHZDa1B5SEpFQkF0dXRLNGtFL28vUzVLd05valJkTmVkWjBZWGFqOGpVNytwOFVPSGUxcFc5clM4eWdtL2gxbGZFMGRyaTFKemFtNVhmNUs4VGVQZTJMa2NyVGwzRFFHV09jUE9ONnpVODFHVHh3bkZiT2tvUytBTzI5d2EzS3VuSU9EcVB4NTVZK29MU1FMVGppaDdDcld2cjAvanN0ME0xdDRqOFZyRG1wbVpRdkhmd05nelVvSzJ2VCtjajcwQ29JR2VpM0ZtNlZNQ040V3BjUC9zTmd4dGx0cWhlMjNkS0RQMldiaWx3RFFkS2J1Z0tNZ2ViTmg3dUMvd1UvQ2p2YkgyNk5sV1pnY0dZTVRWN1dLTkxBSU5SZXZ5TllVeGpFQ3crU25kVXA2d0dTbTVxNDFRVngySEZtMEpUL2pyNGkvSW0rYWJxQVZYeHpMelFYUVg4VCtrUGkvbzg5L1praWd5TlhSa2F6R3hkUnFzQTI0RHhnZks4ZW9EaXVMQTVkZmhyOTg3cGExM2VHNXJjZ2tWTklMYzYwcXJHdDNEQWU1amZ6dEdyQzE1dy9qdUZyeFM1ZUx5elBCUmlQL0R4M3RUazNOUVhEYmpnUkUzQVdFYkdZSXJxZlA2c25IV095Wi93RnNXam10bnI2TXRSVHhwZGRFWWdOTm80STgvYkV6NlJCSThCTFBLQXRqLzNia3owbHNCbldQOFIzL2l6K3pSY3dxMDdXNWJ6dlBXVnU5SHFmcU91ZEZaZUxkVHBTbFg1aDlWNCttMjVVVCtyZ2xTREM4c0NoZUU4Zm1RRyszSzJ6aTlnTVBvLzJOK1FKbnNYNnVyT0ZsZE4vZHF0R3dyZHBCNHFPYTFkSytXTWpERlJkcG8yVG9IUUJjY1VRVzdFd2tCR01PKzBQeWU0c1NmVDJPUnNCTVBTdnQzaWJ3eWh1UG9vM2NrN0VKaXh5Y2lSb1ExRHMvVit0KzJuWmordzRoZFlmbE5VOTBBY0RXZkJlQS9GUnh3dE1OamFyeVpUYk9WelcwU1k4Z2dFWTU5RUR4anFZTHkzVkJQQlVJNEJkLzFSbWhpUFFsQnFueEppMW9PM2NXcnFpbWVLWThhNEp4eDVvV2VIclhSaDBRK2xsYWFTMS9sdXd6UGZ1d0I3YjFnYWhGdHFrWUtqRjRJOVppQ2lOWTZRQUhlZHdmcDhENUg3SUREMTd6RlFiRWpDaGthRm02dzV6aEJ6Mzk3VXA0eUZ1U0xrYyt4TncxQ0pUWDEremlONUFVWHRLdVBTSXFtaDgzRVZKS3dqTWkyWWo3ajVJaTRkbUVZQUt3UXNzc1h4eHRBVmp6cEJ6em9yallDWk9IQUZtaHRDMGYxdWNuVDQyOHBpMGVYMGRCaGtOVE8wYVdKcTJMRldIN3VFRnd4VUJrNnc4MGRZMEpVMnB3WlE4amVsY3ZKQU1OelpJbVh6aXEzRTBoRWY3VTF0ZUxCRUZCQkhMUjh4TUVDaHlhbDVneTJFVzFmYjE0elhKR2txTEdGS0RMUzBqdjlXVjRERlBlbzBycU15U1ZBM1FQN3NObW85ZitzWFl2RlJDaTl5S3YzbXQvblJ5ZGhZVkxYSHpRcmpROERqeTN0VG0yTW5Kb1hpbzJlTHF3d09lR1Rrd3pYZ2NCQ0NNaHdRYUFmZUxvTVhxZTZFVEk3NDBkdktqbzc5c1ZDdWhkak1UNHh6cFpLd01xVXE2YWlVcTJCU0twMm4xTkNWdFhYWFVoTlA4K1dCSkNJR3lnNXVuZ2dZU2hVMFVSRFErUVE3YlNYUHQ0TWF5a09uTUx3MldQa3FISjBqZ3Y5RDlPVnA5WTB5U3lkQlYwV2pwbE53ZXIvaFBOYlUzSmRPQTZkZ1d1eWNKWVlSTVF2aTZJNWpnSFZQdmprRGYzekdRU0hPdEdkcHc3clJhemtJL01EVVdrNUFJYU5QbVkvQ29mdmExbGsxZXBERUhTenhPWmw2SUxCUk4vL3hPeGdxaDlNeDhQOU0wNUUrQnZBdFBVQytXWmVkQmY1KzZjalk0amczT1pWWmlhUGNGbG9PY1VVYmJFYVVuY0dOa3ZJOWJLNXNjYlFHM0w2VkZEaml2ZVg0VlNhcnk5bXB3WnFiZWhGNDZFM2ExUG1rd3prOE0zN0RERC9PdTRLaXpvQ3M0cmZFMGswRUF2VUFXVDRIM0JSMXdIenlUSU8zWCtiY1Z2QURFWEVtNXMyQmpNMjVieTVQK1h0K0w3MEc3NTRwWWg2UUQ5aTlNb0lPZmlGMlFJbmZhaTRkMyt4dzNPL3lyb0Q5YVgyalpyWi95cSttTnVSK0JsNzgvcGZsaWZ2MkdyNUJJRFJGWW9OUCthVzY5N093S3VGMEIxN01IODF2MmNFb3NUVVczV3NqRm9USzRhdjdOUDg4NldyVy9LUXVIVlRUcXg2YzhJbWx5WjR0b2gzdzJYMTluRk05aDIwZGdXOWg2RXAwR29CVitHNk9peHF1YjFZZjQySmVDODBkbUtpcHVXSjN0alprWU42bEoxOU90emJlMzRyZm5JbVNHNnVHYmJzMHdEN3lsdTR4TUJnMzdIVUhuTmZkaWJZbWY4Rm5mWWNMUWovL1ozbUsyTUxBMG16WjBHOVA3Y3VsOGNocitFaC9QVjBxbkw3UTUra081OGdLZHBKdUhTczRGNkwvcG5pb0hUOVMvMm5Wc1FoVUQvR2I0LzNWWXNyU1g1YklJdkZvYSt2OEFuQ1BzWEZNdUNhQWt6M3dPWEx0eVpSOVdWSmxHMldwYzBsQ0paenViRjFCYmMzY3hqZ01SaXlPc3A3RStKaU85UmZHTkFPcVNMcUVXVVl3Tk9NcW5mMEtXQ0gyaW8vTE14NE1iR1NQc1ZlK29PTk1sUDJaS0NYSFVrQ1d6ZlJwYU9yS2dpN1hYNzlBU3hSMEM1V2tTL3ZaNGhGM3Rxam1RRU5aa1VSNktwS3RqOG1ZK2pTMXRHR2hMWS9Xek5Kd1pDcXpNRkRIcG1nanRUSCtzT0xpRjM0bkJxR01qSUdhbXl0MVkzTHFxdkZkeE8rd04rRXNuMHBwbitJVFRPYVp4YW5EMnRMUjF0UXhUUHUwMHVaa3JKZU8wN0JvM0dVcDkrNXhUVkU5MkdLRnQ4K0xsVXc5a1lCNFQ3VUlndCtZdXN4VU9ObkkvSUlqbGkrZzFtejFxbms5Ly8yM243UEJqVDlUaTJzU1MxNWZYam01ZDkvTVpKSHZQczlQYStPM3pLT0IvL29TWE9QYlgzMyswK3k0OUVjMGVIY0VSUFVybHR0WjhCcjRPNzNBMHVNNHMveWVPTnVkRDUxbnNyWDFaZk9xRkdQeEYwdWExN0oyOWdUdE81WU9qN2d1NTZDUzVZdHEyYmZLdHEyZmg2ZTdYS1RiL3RTek9SZG1zZWg3SFY2Y1hKVkQvZk9xdjdOUTVwQnFQRkpQUWNyeW9qQjFIdFBQL3JZc2ozTkNDeURIN3QrazI4ekJQM2ZsSGVMbkxZbWZkMis1SjdXSE53TlNiWWl2SmJFRjhZMnFxcTkvMWM4U1I2RjFmUEx4aVFjTEpjNlBxMzZVcFhGR1NoczNmbWozYjJpWjVmRmJWLzA0Uzd5bEE3ZE9Tc0g1Z1M4aFZMOTAxZDg2RHhVNE1ObzY3eWhJV3llSnNpM0VVNmZQSmFtMVRiUDQyelphT3pEdDMvU3RPTVlnYnYzdTZzU3l0TkRaT1NpS25lMkhoUFBmMVQ3alBQWi9YQlZWckhnU3RlckpiMXY4UXVwVHZGZklKUk8vNmdkUkZxYmZyTlRyMy9Scnl5TGxvdGNIUFBIYUFQMy8rWi9lY2NDZUcvVThaK3ZnYjlmSTVJUzc4VFlLcXArUDZkWVNvakMxL05EWlZpandRejg5dllyOG5STTZTZkp0R3dFSEE1ekNlQm5CalVOb0UwZmJ0RUFRcWFyRXZ4dFZsT1RPVmZIY0orWVRROEJQSXhpaC9rMy9YdmpXditxbjF0WjEwbS9WSTVneHQ0NWwrNDN2NHBIRTRxc0ZlcXFCandCc0hZTG5wSC9EZGxDWitMZ05yRk9XcmtOUWdwd2lXcVZxQ1JpM0Q1aDRUamtPUEwxa08wbnFoNFRBd20zSEs2MHYrbUhpd0d6cjNObXVjKzlzZytMVmJ4SHlZZDYvNlN1TzdXOHhKNUpLMjJPaGF2VWs5czV0MXlHVExwVHhmUjVqbEFzb1MxSm5LMkhVN2lLVUJjNGM4MVNGQkhvdHFZVEdSUkd3VUNtOFgzZk9kdXZiVG5XYnlQaFJ0QXRBc0xUM2lTbElLUWpRY3dJU01wU0xScjV5TURnUEFlM08vK3JmK3RaRVllRG5hRGZqNGdQZ3JsUEl5Wkdwc2Q0c0dPVm1QdHJBWUJ6WUFyT1g4MUgxWHJXWTAxeG45TEdwYUFUVzVVTE5PbktkL2VuaUVsSHJTK21qSkV4M1JoQWpZN0RvWElUQ2JvMHhtTVp3UXR4ZEFyZFNWUHpCbkdjc2NWVUdrS1F5VlpyWUhsYXB0dmpKTFlMVFhlbGFTUDcrTkZIKzNEeTZGc1JPRnQ3T3pHMmMrSENnNUtTcTJOKzdVakFrMWJyNmN2L2srMTF6TGlvSGQ2Wi8yWnhuUG45WFZzTmlmSUVDWjdDc2ppa1NLak5mNm9acHdpVG44R0hqb0w2YnZ2V0ZSMUpwaEorVFFwbUIyTlRuMHJreEM5NVJFT1RrM05KNWtod2k3eUxGTXduc1kwYWFvSjI5NUFjR3FZNVdkbVZGODR3clRlM1p1WnhjYnlUMTJuTXRFaUMvZ29kTDkwQ2FUQkVReHd3TzFUSDlHaFhhUDhtdng4cktDM2hXbVBxQUcySGV5RHEvLzV4c0Z0K2NjVW9NT1J6R3J1aWM3a2FmVndJdjBHcWVWL0NhUG8xL0g2K3B5eVdWdFltbEs1S3RTVVgxdlJ6YjRpaC9nci9Pd2s4cUFYOFgvQnM3dG1sbG90Lzlic2VpZk1WZkhWVk52dzN2dkdlTExwR0QyV1k0VmdWK1g4RWdtakhtSlRDUVhEaVo3cXhBWGRzQ3Y3SDBLWFh6dzAwbXVkMHdQcHpWdDFPeVNHcnFIcU9JS0w5a25sbytQZGlUYVF3QzZNK3diUWpWQkFoVCt5eGU2ZnM0OUYvREFPMXBHb2JJMitjU0JrbFVZaGpRaW45bnlROHNYWWtzN2Jyc3VKaFkrb0x5SWRYakxPUldycUhQcVh4TnBqc3dXTGhtTU1xYkhSeVg4MnBIaGVKVGVxYmdHeEorRVIwQXVDbWwyU3YweDVONmNUTFByWEplb3BwUEIzUDNoY1VzZFJvMEZncWVwL2xGdHYvblpPSkNINkJyN3MrT2Y1N3VkZGhaeUtsVjUweWpDdlpsK0RyaENTTVk3WUNvZXNEL09Sd29vbHFtTXJIL0Z6K0JDOWZTNXk2V0gycFRHMVhBUXpEQXNqTkZsTTlVRDNLWVBsaXVwR2ZwKy9DMC8xYm90L3IzL2l6ZnFLZnpwMzdVc1NqbVVPaU02V2tsOWd2d3NZaWU0cmVMOVUrNW1IU1IzUWxHUHJVSnI3RTc4dDdVNU5nTUVPYXBnU1dxT2NYUjBjKzJPWlFBZ2ZmTkplMWFLUFVTNEliclV0ZmF3clY3cjQzd3V6RUl6QjBNV0pyaVhVY3VpYlVtODQremZMUUJuSHhvRmYydEFjZnNqRnFCMDR3V2Z3VmdNRTFuaDBVbSt5T3E5eWJ6c3NNSzI1NjA4UGZUNHdLY3h3QnQvMHQwSU8zKytMTzgzSkwvQ0F4Z0lkOUZ2RmwwU3hyQnlvVVQ5V0NKNnVZWk8xU0hGNEZRVFF2NzNpVUxpU1JNN3dBbmIwMjl1TCtjMm0ra2M1dmRMSy9Tc3p5bzQxa1NwcG10UFNZU1luNEs1eXVNUjVKU3BaMEExQmJ1WFZ1WGtTbndPeEE4RHVrQ01sMzgvWGJQdUZORzJSbGNpbUN4RUR6OUEzcWswZmtnL0ptNFhXM3dKdW5XZFdGQjQ1blB5MkF3eGZjejdMY0JqUlpEZlBYNXlKNG9lM2lJZGpOTzJSbURlV3VwVnQ2QjVhaFc0Q2NWaGJOWTV6QTdXYmptWmx4TnY0eEh0RkJYcitzTys2SHc4dzZ6Z1hyQWM1MXBSVUd5VHVCTVN6aGhQb3hza1UxZTRWOGpFQm9YK0k0OGtJSnhEb1B4OEdmeHJvUlRaR29FSDZSQURNY1BwdmE0ZVJuT1Q3dGFUWEcwaHZtSU1YUjVDL05SREdpOG41SWxreVhiS2tZWmJVek5qRUd3U3ZHM0xYMjZBd0dMUUxoSTdXQ2NXeHFKaTlPR3ZzOWZGVk1laXg0dmkxMk8rWXFmaTExRUdLaUk0SEhaS09KMHNTMEY0aUtUN3RnZERMQWRIUkpiVmkxYml4NWpUL2pEVi8vVHJxT0xsdHBJVHQ2QlFFWndvaFIvbTdHSjUwdkhxRHFOWjNxOUE0WnRGSTAvZ2RjTGMwRlZicWxiajd3MC91bjBQOHBsTFJ5ell6bFdOeVN2UlgyeVRXTTNBSEVVa1B5WExyWDdTVHB6VDQwZWsvd3BIVGpOOFhjd0R6LzkzRW0rS0FhaGdreE96VjhUNzkySGFvcGxqYzZMMzVta0dMaUVnOFM1NVZMZkszSVpkYjY0VFAvWGFQaG1lcWdocjIramo5YkUvOVI1cHZnN3NEU2JoUUVkWThheEhnaXdqOExXWmJPaGQyRCs2VFU1b3FMTVJsMlZPdVUzNVlDeFBOQ3hDTDlVNVQ0MDl6MllJb1B1WlBGV09EMlkrcFN6TktKWHNINGFnQUZwcEFsbmcrcmJPMm5BczBid0dFUE9JejU1dFNTdHo0OS9MMWtDTjN5Tm5oZEh1VDJaWDVTRE1mUnBidWdiL3lkMWVqVi9MSnVrTWVIdCtmWmxPT2JvemdqVVQ3bXI0ZlUxZHBPVVoveituQmIyejN3ZSsxVFlUOGpNblBlQXozOWJzTGRsU2Q0dmlkc3VXQWM0RTF0UGQ0QjdSSVoyL1F4OHorV2xGVlNXcjVra04yTlV1VXVibE1iSUVSaW9pVW5qN0FKUDZ1YWMzL2tpWGRWY3I2bzF2dndGY2pKRXBoWXU0SzVkS0k0MmREMFRIWTc0NEhjV0xUNS91N3dVS2F0NkxSKzhNTWR5b1R3YzdSYVJpZFg5ZUU1d1ltalg3ajBqTDBwOC9OcDRhWWpzaWIyRHBQd1Y3cWc4NHRpb0tHZlVGbWwxN1VVNWxsZXQyWjJScGFxYzkvSjMzMEtXTDgvSkVocEN6dHZaWktjcFRMUGpIQzZCLzBVODNaRHhSbm5zeitQcmNubC96cjVPUWFEUWtzaHE3VVpCTUdCalVQaHRSU2YrTDhYVEM4dCsvM2ZnVDhTMkJtRVpkWTFBalF6ZGpNRkFvbXRoSXNvZ3A2NXRIZk1namlHSHlCaVFPUjZldHl1WDV2RHFNcHNpNTZLOC95L0o1ejJIeTVtcGIzQ3NubUI3UzlZaGliMlJMb0Q1UmJhM3NlWjZVdEo3U2E3ejdmSjFGL2d0TFhqSlRuZmVEZ2FJY1piOHVsVUMvZHZ3MlB6dVg1N1hQdjhoUEQxWGJ2L1RPdTUxdFFBdnQ2KzA4VjNON0FuMml3cWYrVTdtMitYcE5DWW1NWEtBNXd2YXJRYitaWGgrR1U1ZThOeTUzUDJUN3o5QjUrQXh0Z08xM21COFlZNzU2TWUrbU04NzlZS1ptNXBLOHBxU2VBTFRSVGxRR2hjM2Q3a3p1RkZLOHA1bGM2ZlA3a2xOQkluTlB6R3p0YkZyTmVnZVpseXpzWEttZWNqUUhobExLSEw0Wisrem5vSDkyL0dvM1ZnWm1kbzRzVVgzVmZtM2RtUDVYeUpQclkwM1ZxUFpIc3NMamp0LzZmcHRFNi9odkVXNzY5UVNWUTlNbEtpSUw5Wm43MnRqc2thdW42WGw1VGtSc2taeUdXMDhHRTQ5Wi9tV01rUWEvQytoeGRiV3BnZ0dRMFRqTWxUR2Z6dGJIQitzd1J6SHp5UnZNNk1icDZRdG5PN0szVU5ubXByWkFjb0JOeVI1OXBsdWVqQkF0SlpScTh2Z0svS2xnWnJaR3pNSEhQT1hXQXVqR3dpOExaNHg3NXNCQ3JHZlBkUDVuU25VMTJHa2MvZHgzSjhhK3UxT3FxM3ZtRXZXQStJK3RUaDFpT2tBSmlwK3g3UDA2V0dtb1d5bTNhWGxlRUFiNzJmYStOQ2tFWXRBYU1Zd0dHVUEwMVZnT1VPZnhpVCsxT2V2b045VHplb1hyWlU4VlNmOG5BNHJXK1dLdUVhOUpyRnVNRTRzUGNhK0I4bVhQTG5IV0k1cC9vaWV5MkhDeStiM016bUsxOVFkUDRlbk83S09TT0pwOVVEcUVxaFAxTEpydzJZdXRhZ3ZiZVVzNmpoR3BzREh3T2U5eG41endsdlZpOUdOSnFwTnNmNTR2ZGpqcnVSM1dtTlA0U3R3MllrN1cvejBWdldIcjhsei81WmFtQmZKVjdGekt3aVZ3MHR5MTAvQmM2NG01b21hQ3c1d2p5elFWQmtNU016d3gyMU9lL09UWDdHR1pJdWpuTlFDREtvTk4zYXZxRmNwY1hmNDQ3N1E1TGh4eUp2WFVneElxNnRuY3F2ZGNYT1IxL2cxSFJ2QS9YRWZzZ09tdCtjM3NrWUp4SkZuVHVZN3VuWXpJcHZVTmZ5UThGVTgyTFdwengrUjRZV21iR0RPVTNpV2pRM2xEclEraGRhaDRQblBhSzhHWjJrS3RwTWV6SGtQeDhSd3NDQTVDL3hNZlZORG1QK2MzNG45UDROVDkvWmt2ck81VVc1eGp6dER3N3pON3pCTnBBSHhNUmxUSzJMbUoveSswdzNuR2h2MXRQaC9XcDRhY1lZbUwxMHZlOXJIczBYUHN3WCtZSWtqRm9nTFVzOXE0amtHNDBRU3gyc1lqQTR3NS9lR1BpVXQ1SVkyM0pBOHVLZ1dyZlM4WkdxUHFTVFNFeWRncDA5d2daMmw5ZXpmN0VETllZQTJmNndQY21BaUdFNWpmSy93UmZLeVQ2SFk3aVdUN2xBS3hmSGFuc3lidGNIMHZ2dUYxS28yVDBHdzlMa0xSRFd3QmQ0SDRqaXo4azJCMC9aWTlIbk0wMlc5aVNtVWEvaTErcDV6Y24rNmlWaUQvOHI3K0YreUpjQlYvOEZIeldOd2xMdmJ6L083OTRGOTNPVkJ5bSt6KzQyNmt1NDhCRVRHTFE3MCtMSllOdG1nV0hJdFdtaUtFb1JnNFpidG9vUkU0cDJyNWNPdmlxcllYNW9wS21hR1JWSDRGRXRpTXlTU3hGRW0zWkdVeUQxTmlWeC9FZno1V2hyenVhbFBFZFRWR0hJSWkrR1hSYUFtWUFDTDlvdVQreVhycDRhK29lSE1aRVBLZTRvMktOcjJ4STBQNXJMNFJzNlRBMitrc3RUM05qYkJvQ3JaejB5dEtLY1Q1ZHpVU09yWmE1ZmkwakNCdEpYdXlWamlPSlBHODN4WmF6ZVNSQkxDa3pETDFEMEdtMWxESXdmemhKWXVNNlFGYmF3ZXl0YUI4cEFmaWxONUJ6U1c0TnJRNTY1QjJOWkVFSWROcmZLbFNxMU9WQXgvV1hiOVVRaDQxeENuSHVUY0w0Q2IxNTR2UzV6NURTMU5sOUlhVEM3UU55a2RpNjFLdUdkTDl2Z3NLYVZSODI5TDVWNVJwNXFpVGg5VWRUb25CeFVWQnozTWRQVkE1OHVpYjB0RlhUSHE4bjR6bHBYbGJUclRpbEp2bjkwYnVuekE2dGo4ekd4V2QrUDdGV3QvVzIyYTN6TTExck8wL1doNnA4cUxGWnFUZWQxR1h6UnR4RXFpN0lHQ3hyYm94VEEvbHAwYjRjYUY0dmhRdE9yZ01RLzBlckdoZW1PamV6WjFsaXloNVV3dnJvbTNGTThpbFJHclBCaEt2RTBrY1pRWDlGK1RNTHFXcnNCcXBtd2xNcFk4ZHd6RFVXTGVSMThNa1hjZGJaeUMyNWp5Q3RrOWl2SlJkYmlGejUvQ2N4dTdobmo3aGFvWklrOURTWndPcFFudndZRk1RM3FCU2UyeVhFSlF0TVhxVVZWVStVSFpvTG1pM1dhQ0c2MmxmTzRXSmZybFp2MFVsMFVyQkFoVXJLSVlrSmNsTlNzOEQ5SnVVT09kMlBScFc1VE5qYkg1d004WHoxQityRnBoTUE1L253elVXdzkrVmdZTzFuKzRERlJ0UUNGKzN5djVZWFE2ZjhXbCsvMDlNUWJkQ1c3VVBPeEZkYWt1NVNOcVYxQUdCNG9IeEVkM0p2RFl0dERXT202YzBlWDJNcVdIK1lId0JVMmhUS3FTWnhJeWYzV3hMUEJEUTJNVG9XaTc3end3cHpwd3BObGUwbm1nVENsenVoeTFaWTdhcGdLR2ZTeVkydVBPenNsaFo1NDBVVWphbHl5bmlhcG5jSzVMWlhCVnRyd1FXVHFXTFZWMG9vZDlZZHVIaUM1SmJWMW1pTzNWaG1FcEU0WllPMHhHdzdxbk91UUl5ME5GbDBxdkRSUVBoZ29MeDN3T3VCZ1pBNFhTWURKRlpxRG1kVmNjRE95c2JadG5HTGZQSFQ2aTNicEFWdzgyTGIwai9FRGEyN1F4USthUFhaL2kwRHo3ZUVBWUN6bDFRM1ZXWjBrNjFrOWZIZ1NldXJWTC9wTjAxd3JmSm50WEVkWEEwTlhFRnZEOThjWVVFYm1IOWNxeUdlcTZEN2Z4Snl5VHN5V1Q0bmZ4djdYLzNRZmh0dnJkY2IvMVAvOUpDUGV1MFhBRk1YL3YzZGR1dDlHWUQzRVppOWJsd2k4NzNYYUQveVNOaVJ0YjhFNTVvRjdwcTR3YU9oTHJYWTFwN29XcEw3MWYxaTVVekR1RnhiZGd6cWFHTmlUN2RWb1RKUVhDeGpXQzhjSFVhQmxqSVFmVVJpNExlb2w4N1VBeG0rQ1g3QWRhTy9Ud09ad2FCRjcxZ0czNGc4Q3ZpQnhSSDdmMDgwcmJ0b09CR21DZmtoeG1TUHk0OUxTVmoyV2lOMWpTcXkzWEtzZDJLcTVzb3lxK3A4L1RxbFdsV05xcm50V2JyVmNsbm1lNjRwa0QrZUU4L3JIK1A4RjlGWjZRdmRweG1LWXRTaG9jRDlJcGRwYzBCQ3FQN1pMY3NxVVU4Mm9SM2pSYWU1b3A4b0pQdFFXUlpXT2k1SFloUXl0T1pVcmZpZnBkV04vS3lPajJOY28yRCtLYndFRGxMT3p2YzhRbnd2cVNxVUkybjUrYjJwaWoraFFkc08wdnhxQjdxMnEzWkI1bitLb1dLai91SEJ3TGlFT0VkVC9sMGVzMnZsZ1lJRElmaFVPTU5DWmJneFRiMEZEOWEycERncGNpUGdscjJ6UjhhaXp4YzRpeEpxcGZ5R051YW9UL1UxTlVPV3gvb0tqbXM4RTh0N0NmUUhhbVU5WmdNRVV6VGIybU1sT1hyZU1obDBXYmpDZE5aSThjNW5teTFISHRMb0tWQ3dzN0RISitkd3pqQ3h3ZTQ1K1RiWFVLZTFUUnA1am56dVpPbDV1a3lSN3JlN2QrQUJybHMycExFdmNvVTdUbG5zaHljNXl6dEh2QS9sMGROL2Z6Ykw2a09wVHdXV0duelM0ZGZCWS92QStEY1dad2JpYmRZWDlmTG16Nkp4ZFU2WWJLeG5meGJubFlhaU9XZnRJbSs2WHRPWCtZRk1IYitMZ2xDcDFENlFOUXZNQlF4V090S0EySjMwanRoVi95eFBGNVd1WjNHNW5MWVRqeitUVWM0SHRSWFBodFhtdjdrY3BlbERrQjRuTnlqM1VZcU15VHVaVjZHY3NqOWF1T0IxUXFoMW83MjhFZ0tibFRtaCszZGZsei9ObzRUQzg0MmhvNFVQMloxdHFlcGJaTndNbGNpcll6N0FOajNqRjFMNFlENVVCOUVoNzdNRzA5Yy9ZUVBNb0g1N1pub0g1RXJmUnJZK295aG5OVDRzT3N4VzU0cnVWWk1GL1g4Mnl1dlpSdVJYNXBVaDNCZFN4RmU3bmVLV01hSEZNZnQvTVNXYzhXNWVoWUR0MzhSckVWMEpBYzNOek9PNVU3TkUvbUo3Uzg2R0JBaW9kelM4ZkhIRkRIUVljTk82RFJHSzJTNitEMCtjclBwSHljMWVUaS9EWWx4emlMNUFVS1hkVE03UkZhN2V0MjlpSStvd2NYcW9WL2RySmxTRk1mdkFndkdKYWFUMFYweEczYXBsTXozbFRjd29mdTNPYmx1ZTA0c0gvY3RSVU1HRWpZcjJodDNYQWJuY2xoeEJpR3JuRWVoTE5MU0wrd05ZWHlFc3hITmQrUExlakpIZzluSmY1NHk2NmNPU3kxc0MwczVOeGFEclE0VDRqU3lGVlB0bmM0Y3hEdFoyWWtDWFlMdDdETmQ0MThHUFVKdXJkRnFIVGtMeXlick5OQUwvejFCbjdaWXd4azZ5UVhsMWErZUNPWHFwVHRRS2QrZG1oU1hxNHRoUnVpbXRrQk9aRFZBMkhzWFRMWHMwdTdBOFdEWElyampvemFUNWJzMWovVFdqNEhWeDR1L1VSNTFMSlp0dFhoejFWTEhKU0c5YVhYZXB3Z3U5S0VISVQ5MFNqdG5mREhXNTFMR3RYUEtpcnN0a3pqL3BlM294TmJTdTNuVWFjMXpwRXJXODg1WWx2cUphaHVDNDhIV3h4blRuQlh5ZDViTFkzVzh0amwxaFJsL0o3V1lUaFp0ajZaVDdQbHUvQkp5STdiWkhuM2VLaDdJOCtNeDFod2p5d2NLQWh1MElMeXpKdVZNWlF3SFdaYXEzMnZndWZUR2s1VUg0Z0kyeUNzTit2dHh0WGZNeDNQaFp3ZDFwbzNiWHMrNGZWYTY3a3hxWjRwNlpnVjhTK1N4bW01WHBDUk5ZSXhFODM4VWZPYXNIbEwyZmpKZHVyU2ZwenNsUDk4M21ESkwrbXp6V1hyMmpJcGxqMEdoaXJsbjdzeFdSdUFPYjhHSXFMbFVpVGZLNU40Qy9QVlBkYlRpT3BwNnVPUDE1amVNRC9uRDU3SVlYbWFRTzBrRCtYbzR4UXR1TVdhSS9FeFZyUVYrNWpuS0gxTWdXZGdNQTdNQUsyUHZoYmhXYmZVNDJkd1IwOW9LTnV3eFNkOXpVNGNxbmVQOXpOTnpZekJkQWduWGJqOGhiWXlVQmxhWW9IbFowVG5wTTkzWlZ1ZEtiRGx0WllQMG8ySFpvdm1zTXZmSTNTZzVRTWtCMHZhU1Z1dG5UUW0xbVZFNlVCT0c2RTZmTUNYNTZ4VzEyY0MwYmtJQkhMdTZEeGpDSHNzdHg0Y3lJdzFtZTVzelk2TVorQitXY3NrT3VlL0hrOUdXZGI1cU5IeW5wdFZqS2x1eUx6R1UyU0tLSy95QVhlamZkRSs4RkVTcGp3UUgzZDJzUzJacGN0MDNjTGZ1eEk2dmlmNXo4eUxVNGVGUDVpRGdVbExLOFFVT2N1T2NzYlNYeW55TFJaZHh5dlhwekxwRHN2b2NHY0wvTm9TMWJWVnRiU1hzU1hLTU4xTURqR3paK0E2T1VHRXhtaUxzc3lvNUJOeWVvZkFlN2F1UkdBd2plM0p4bTJmNkdIVVdxaEtHL3VjdkRiSEtIS2FaVjRWei9zTnZ2SGNxUzBuZERuRytMVzJMdjd6WXNtMzIrc0NzbFZ1amVmTE5ucHlTTGpBZFBzdEo2MVZhb2NQMi9WTTZldVJVelB1VFczbGFvUHF0QTZ5cnFjdjNXeld1dmJsRi92NXY3RTlxc3UzYkoyRDJZSExqck0zZjcwZjhkbzR0SWtibUovRlJXRUg1SFAzVTBNanZQaHdyc1hwbEM5cDNOVDJvMDF0eUwyMS8vd0xXNGRPZUtScXg5RzY1MWJlK3pYeFlxUGxaZys0UlhOTHVqUDNxN2FiWEs2dmdhZUc5cGpNTkd3MzJHS05ndDBiR3NwaHpzYkFaejJDQ3p3ZXgzcFFZTDNtVm0zU0UxdmxkZmpsTHp3LzhxeXYycEY3cnBYeE44c3VxeHpSWSt5UXZDcktKUEhPWFJTNHVOZkcreWZ5Ymk4OFM3WFcva0g5d3puZ3F2VUpIRk9scEp4ZjZNdzNkN2NoWUppRVVXUDd1R1BzREhldmdyT3hsWWxjeXhXMjlHWnp5NU9PTFFhZXNFSHRzMWRQNCtlaVRIOVZuaE43ZUJPNWVMNUUvZ1JYMWIyek1LcC9ERFpMRzhiMlhUMnVsMC9zRDNsR2FKZDJ2elc4SkM1UEFEZmV3SHkwQjV4Q2MxWDY0dG44VE5lWnRLZDVwOWI1dDIrY1EzbGhlWGtKZTFrZVR1dHFWaVBPMUtjNTlsY0wvNzM2RGM4Y3hYTDBzZXZiRi9JZXBnV3QxVm00aG1ZeXBpNlYyMmNWVzVnWDk1Zlh6YlgzWWsyMGEwMjh0YjdaT2RiRGJmVDMvbzkvL3JqcTAvT3VHa2VUdGM5UStuT25qLzA0ZTdMWWY5NUJZdityVHdDNy9NQ3YxeC9VZGVIcFhQV3p0VTdPMHdxczBIL0FQMjc2Nzk2OSt4NytUMjlKampLNzZWSGUrTkI5Rk9QMzBJcDkxZGttZmhUZTlIYnM5eER6NzAxdlNaLzVIZ1pPYXRsV2F0MThEM3M5VEtGeVFQd2JXY0JCQ2JuVjYza09DdDRuM2dmb1dkdTAvbFN2WjhYeCswMEduRzNvcEU3eTNvOCt0RWZxZXNZUGs5UUs0YlBQQlZab3VZNzlEdVEzdlltRGd1TnpsZmppeDdaWm1QcjFyeWF4QXduc2FSNDdONzBLMGZoUzRpQUhwdEgyNXEwblFOaTlHUFZkZ1ZETVQvUUt2WC9Ud3p4ZFhTbVkvNlozTDN3ckx4NzVzWHo0T2FvZkpicUQ4RlljSngrTzFQOWNQZnNmelFDOW5oV0dVVW9Fc0p3RWtiRG1lK25XZDExbm05ejAvdSs3RXYvL0tQL285ZjU5L0xQWCs5NS8yRWJCOS81TjR5cStqakg3dlgvenZXVVp2dmV2Mms5aTFKQW5DN05FcGZ4N3YvN2NqNnZXVjMwSDJWaDVreGN4Wjc4dlNmK2UvSUxWUVkzL1lQNzVuc3l5UHVLUDhzOS8xdVNpMUl3M1BiWkRKZ0lyaGQ2c3pnQXZvL05MS1YzQ1gzNnV6b2Y0UDlUODlKUDg5M0xZWHM2SGwvRGlTL214MTZ1UWovODdFcTAyelZKcjdCMVE1d0ZDMG5Lc2ttZHE5K3VLcHoxVVhRR1YvMVhmL25haWthb2h1elFUSVUzZEEyaDlzM0lHYms2R0l4OXF3OUkwNjYyWENndC9PcFNWZWplOUR5LzdRdjNNeTJxazR0RExtK2NWSy9FMnFZL1VvVm5KM1NiZGozcVd4emNGOHVwL2g2V2xibkl4alRTcXNFM1IwZEtNeGIzNkJHcDhUVTlxTFljaUZsejBDOGhkLzhnUzJkYW5KTC9Zank1SDJEb1A1ZmRMdjUwQWtHNnQxSEh6QmdpVVN3cFJKbjh2bTQvMWV0aEExQmoycWFtM0psOTgrSGlIQkNFM3ZRcjU1VjBuM0hVb2pPLzl6MS92NWJ2N2Z5M3ZiNVg3MWJkL2ZWTytUdStFKzZabElTYzg0NGV0T0taM0t2dFhlaTJGdjBUNFZ2Q3MwSFdlbHhLaW5oSXl3UTRwNmJDNlJ5bXA0ZWEvUTBwUUZHMnltRVlNeFdSUUJDMTAwOE1oeHZPNEprRk1CNWJwOVROWVZ2RE4veEovdjFROHJWaW5MWENsdjE1S2VNM25MbTFJV3FHakZzemQ5SEFzVi9pVFQ0V0RONzB5R3Z3ZTlxLzZPMG9vRW9qV3N4RFEyL3BKR3NWZS84Zi9Dd0FBLy8raHFZVU1wYWNBQUE9PQ==
type: helm.sh/release.v1

Decoded json:

{
  "name": "dotnet",
  "info": {
    "first_deployed": "2023-02-14T23:49:12.655951052+01:00",
    "last_deployed": "2023-02-14T23:49:12.655951052+01:00",
    "deleted": "",
    "description": "Install complete",
    "status": "deployed",
    "notes": "\nYour .NET app is building! To view the build logs, run:\n\noc logs bc/dotnet --follow\n\nNote that your Deployment will report \"ErrImagePull\" and \"ImagePullBackOff\" until the build is complete. Once the build is complete, your image will be automatically rolled out."
  },
  "chart": {
    "metadata": {
      "name": "dotnet",
      "version": "0.0.1",
      "description": "A Helm chart to build and deploy .NET applications",
      "keywords": [
        "runtimes",
        "dotnet"
      ],
      "apiVersion": "v2",
      "annotations": {
        "chart_url": "https://github.com/openshift-helm-charts/charts/releases/download/redhat-dotnet-0.0.1/redhat-dotnet-0.0.1.tgz"
      }
    },
    "lock": null,
    "templates": [
      /* removed */
    ],
    "values": {
      "build": {
        "contextDir": null,
        "enabled": true,
        "env": null,
        "imageStreamTag": {
          "name": "dotnet:3.1",
          "namespace": "openshift",
          "useReleaseNamespace": false
        },
        "output": {
          "kind": "ImageStreamTag",
          "pushSecret": null
        },
        "pullSecret": null,
        "ref": "dotnetcore-3.1",
        "resources": null,
        "startupProject": "app",
        "uri": "https://github.com/redhat-developer/s2i-dotnetcore-ex"
      },
      "deploy": {
        "applicationProperties": {
          "enabled": false,
          "mountPath": "/deployments/config/",
          "properties": "## Properties go here"
        },
        "env": null,
        "envFrom": null,
        "extraContainers": null,
        "initContainers": null,
        "livenessProbe": {
          "tcpSocket": {
            "port": "http"
          }
        },
        "ports": [
          {
            "name": "http",
            "port": 8080,
            "protocol": "TCP",
            "targetPort": 8080
          }
        ],
        "readinessProbe": {
          "httpGet": {
            "path": "/",
            "port": "http"
          }
        },
        "replicas": 1,
        "resources": null,
        "route": {
          "enabled": true,
          "targetPort": "http",
          "tls": {
            "caCertificate": null,
            "certificate": null,
            "destinationCACertificate": null,
            "enabled": true,
            "insecureEdgeTerminationPolicy": "Redirect",
            "key": null,
            "termination": "edge"
          }
        },
        "serviceType": "ClusterIP",
        "volumeMounts": null,
        "volumes": null
      },
      "global": {
        "nameOverride": null
      },
      "image": {
        "name": null,
        "tag": "latest"
      }
    },
    "schema": "removed",
    "files": [
      {
        "name": "README.md",
        "data": "removed"
      }
    ]
  },
  "config": {
    "build": {
      "enabled": true,
      "imageStreamTag": {
        "name": "dotnet:3.1",
        "namespace": "openshift",
        "useReleaseNamespace": false
      },
      "output": {
        "kind": "ImageStreamTag"
      },
      "ref": "dotnetcore-3.1",
      "startupProject": "app",
      "uri": "https://github.com/redhat-developer/s2i-dotnetcore-ex"
    },
    "deploy": {
      "applicationProperties": {
        "enabled": false,
        "mountPath": "/deployments/config/",
        "properties": "## Properties go here"
      },
      "livenessProbe": {
        "tcpSocket": {
          "port": "http"
        }
      },
      "ports": [
        {
          "name": "http",
          "port": 8080,
          "protocol": "TCP",
          "targetPort": 8080
        }
      ],
      "readinessProbe": {
        "httpGet": {
          "path": "/",
          "port": "http"
        }
      },
      "replicas": 1,
      "route": {
        "enabled": true,
        "targetPort": "http",
        "tls": {
          "enabled": true,
          "insecureEdgeTerminationPolicy": "Redirect",
          "termination": "edge"
        }
      },
      "serviceType": "ClusterIP"
    },
    "image": {
      "tag": "latest"
    }
  },
  "manifest": "---\n# Source: dotnet/templates/service.yaml\napiVersion: v1\nkind: Service\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  type: ClusterIP\n  selector:\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n  ports:\n    - name: http\n      port: 8080\n      protocol: TCP\n      targetPort: 8080\n---\n# Source: dotnet/templates/deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\n  annotations:\n    image.openshift.io/triggers: |-\n      [\n        {\n          \"from\":{\n            \"kind\":\"ImageStreamTag\",\n            \"name\":\"dotnet:latest\"\n          },\n          \"fieldPath\":\"spec.template.spec.containers[0].image\"\n        }\n      ]\nspec:\n  replicas: 1\n  selector:\n    matchLabels:\n      app.kubernetes.io/name: dotnet\n      app.kubernetes.io/instance: dotnet\n  template:\n    metadata:\n      labels:\n        helm.sh/chart: dotnet\n        app.kubernetes.io/name: dotnet\n        app.kubernetes.io/instance: dotnet\n        app.kubernetes.io/managed-by: Helm\n        app.openshift.io/runtime: dotnet\n    spec:\n      containers:\n        - name: web\n          image: dotnet:latest\n          ports:\n            - name: http\n              containerPort: 8080\n              protocol: TCP\n          livenessProbe:\n            tcpSocket:\n              port: http\n          readinessProbe:\n            httpGet:\n              path: /\n              port: http\n          volumeMounts:\n      volumes:\n---\n# Source: dotnet/templates/buildconfig.yaml\napiVersion: build.openshift.io/v1\nkind: BuildConfig\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  output:\n    to:\n      kind: ImageStreamTag\n      name: dotnet:latest\n  source:\n    type: Git\n    git:\n      uri: https://github.com/redhat-developer/s2i-dotnetcore-ex\n      ref: dotnetcore-3.1\n  strategy:\n    type: Source\n    sourceStrategy:\n      from:\n        kind: ImageStreamTag\n        name: dotnet:3.1\n        namespace: openshift\n      env:\n        - name: \"DOTNET_STARTUP_PROJECT\"\n          value: \"app\"\n  triggers:\n    - type: ConfigChange\n---\n# Source: dotnet/templates/imagestream.yaml\napiVersion: image.openshift.io/v1\nkind: ImageStream\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  lookupPolicy:\n    local: true\n---\n# Source: dotnet/templates/route.yaml\napiVersion: route.openshift.io/v1\nkind: Route\nmetadata:\n  name: dotnet\n  labels:\n    helm.sh/chart: dotnet\n    app.kubernetes.io/name: dotnet\n    app.kubernetes.io/instance: dotnet\n    app.kubernetes.io/managed-by: Helm\n    app.openshift.io/runtime: dotnet\nspec:\n  to:\n    kind: Service\n    name: dotnet\n  port:\n    targetPort: http\n  tls:\n    termination: edge\n    insecureEdgeTerminationPolicy: Redirect\n",
  "version": 1
}

https://github.com/openshift/console/pull/12578

Bug MGMT-15339: Static network configuration service should run before NetworkManager

View the Description View the linked PRs

Description of the problem:

Currently the `pre-network-manager-config.service` that we use to create static network configurations from the non minimal discovery ISO may run after NetworkManager, and therefore the configurations that it generates may be ignored.

How reproducible:

Not always reproducible, it is time sensitive. Has been observed when there is a large number of static network configurations. See ~~OCPBUGS-16219~~ for details and steps to reproduce.

https://github.com/openshift/assisted-service/pull/5375

Task MGMT-15595: Split assisted-service client into separate go module

View the Description View the linked PRs

This will allow the installer to depend on just the client/api/models modules, and not pull in all of the dependencies of the service (such as libnmstate).

https://github.com/openshift/assisted-service/pull/5434

Bug OCPBUGS-10762: Machine should be Failed if Machine has a Failed state on Azure

View the Description View the linked PRs

Description of problem:

When creating machine and attaching Azure Ultra Disks as Data Disks in Arm cluster, machine is Provisioned, but checked in azure web console, instance is failed with error ZonalAllocationFailed.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-arm64-2023-03-22-204044

How reproducible:

Always

Steps to Reproduce:


/// Not Needed up to point 6 ////

1. Make sure storagecluster is already present
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ultra-disk-sc
provisioner: disk.csi.azure.com # replace with "kubernetes.io/azure-disk" if aks version is less than 1.21
volumeBindingMode: WaitForFirstConsumer # optional, but recommended if you want to wait until the pod that will use this disk is created 
parameters:
  skuname: UltraSSD_LRS
  kind: managed
  cachingMode: None
  diskIopsReadWrite: "2000"  # minimum value: 2 IOPS/GiB 
  diskMbpsReadWrite: "320"   # minimum value: 0.032/GiB
2. Create a new custom secret using the worker-data-secret  
$ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.userData | base64decode}}' | jq > userData.txt
3. Edit the userData.txt by adding below part just before the ending '}' and add a comma 
"storage": {
  "disks": [
    {
      "device": "/dev/disk/azure/scsi1/lun0",
      "partitions": [
        {
          "label": "lun0p1",
          "sizeMiB": 1024,
          "startMiB": 0
        }
      ]
    }
  ],
  "filesystems": [
    {
      "device": "/dev/disk/by-partlabel/lun0p1",
      "format": "xfs",
      "path": "/var/lib/lun0p1"
    }
  ]
},
"systemd": {
  "units": [
    {
      "contents": "[Unit]\nBefore=local-fs.target\n[Mount]\nWhere=/var/lib/lun0p1\nWhat=/dev/disk/by-partlabel/lun0p1\nOptions=defaults,pquota\n[Install]\nWantedBy=local-fs.target\n",
      "enabled": true,
      "name": "var-lib-lun0p1.mount"
    }
  ]
}
4. Extract the disabling template value using below
$ oc -n openshift-machine-api get secret worker-user-data --template='{{index .data.disableTemplating | base64decode}}' | jq > disableTemplating.txt
5. Merge the two files to create a datasecret file to be used 
$ oc -n openshift-machine-api create secret generic worker-user-data-x5 --from-file=userData=userData.txt --from-file=disableTemplating=disableTemplating.txt 


/// Not needed up to here ///

6.modify the new machineset yaml with below datadisk being seperate field as the osDisks 
          dataDisks:
          - nameSuffix: ultrassd
            lun: 0
            diskSizeGB: 4 # The same issue on the machine status fields is reproducible on x86_64 by setting 65535 to overcome the maximum limits of the Azure accounts we use.
            cachingType: None
            deletionPolicy: Delete
            managedDisk:
              storageAccountType: UltraSSD_LRS
7. scale up machineset or delete an existing machine to force the reprovisioning.

Actual results:

Machine stuck in Provisoned phase, but check from azure, it failed
$ oc get machine -o wide                
NAME                                        PHASE         TYPE               REGION      ZONE   AGE     NODE                                        PROVIDERID                                                                                                                                                                              STATE
zhsunaz3231-lds8h-master-0                  Running       Standard_D8ps_v5   centralus   1      4h15m   zhsunaz3231-lds8h-master-0                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-0                  Running
zhsunaz3231-lds8h-master-1                  Running       Standard_D8ps_v5   centralus   2      4h15m   zhsunaz3231-lds8h-master-1                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-1                  Running
zhsunaz3231-lds8h-master-2                  Running       Standard_D8ps_v5   centralus   3      4h15m   zhsunaz3231-lds8h-master-2                  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-master-2                  Running
zhsunaz3231-lds8h-worker-centralus1-sfhs7   Provisioned   Standard_D4ps_v5   centralus   1      3m23s                                               azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/zhsunaz3231-lds8h-rg/providers/Microsoft.Compute/virtualMachines/zhsunaz3231-lds8h-worker-centralus1-sfhs7   Creating

$ oc get machine zhsunaz3231-lds8h-worker-centralus1-sfhs7 -o yaml
  - lastTransitionTime: "2023-03-23T06:07:32Z"
    message: 'Failed to check if machine exists: vm for machine zhsunaz3231-lds8h-worker-centralus1-sfhs7
      exists, but has unexpected ''Failed'' provisioning state'
    reason: ErrorCheckingProvider
    status: Unknown
    type: InstanceExists
  - lastTransitionTime: "2023-03-23T06:07:05Z"
    status: "True"
    type: Terminable
  lastUpdated: "2023-03-23T06:07:32Z"
  phase: Provisioned

Expected results:

Machine should be failed if failed in azure

Additional info:

must-gather: https://drive.google.com/file/d/1z1gyJg4NBT8JK2-aGvQCruJidDHs0DV6/view?usp=sharing

https://github.com/openshift/machine-api-provider-azure/pull/56

Bug OCPBUGS-12790: Cluster-ingress-operator repo Issues link directs people to Bugzilla

View the Description View the linked PRs

Description of problem:

Reported in https://github.com/openshift/cluster-ingress-operator/issues/911

When you open a new issue, it still directs you to Bugzilla, and then doesn't work.

It can be changed here: https://github.com/openshift/cluster-ingress-operator/blob/master/.github/ISSUE_TEMPLATE/config.yml
, but to what?

The correct Jira link is
https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&components=12367900&priority=10300&customfield_12316142=26752

But can the public use this mechanism? Yes - https://redhat-internal.slack.com/archives/CB90SDCAK/p1682527645965899

Version-Release number of selected component (if applicable):

n/a

How reproducible:

May be in other repos too.

Steps to Reproduce:

1. Open Issue in the repo - click on New Issue
2. Follow directions and click on link to open Bugzilla
3. Get message that this doesn't work anymore

Actual results:

You get instructions that don't work to open a bug from an Issue.

Expected results:

You get instructions to just open an Issue, or get correct instructions on how to open a bug using Jira.

Additional info:

Bug OCPBUGS-13910: Request to add 4.14 indexes for community-operators

View the Description View the linked PRs

Description of problem:

4.14 indexes have been bootstrapped and published on the registry. I was told they have to be added to https://github.com/operator-framework/operator-marketplace/blob/master/defaults/03_community_operators.yaml until they can be used in OCP clusters.

Version-Release number of selected component (if applicable):

OCP 4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

4.14 indexes were bootstrapped in CLOUDDST-17591

https://github.com/operator-framework/operator-marketplace/pull/524

Bug OCPBUGS-16433: Horizontal Nav component is not updating the selected tab

View the Description View the linked PRs

Description of problem:

When working with Horizontal Nav the component doesn't re-render when location changes. Currently it only updates itself when basePath changes. The location change based re-render was triggered by withRouter HoC previously but was recently removed.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

1/1

Steps to Reproduce:

1. Go to Storage -> ODF (version 4.13-pre-release)
2. Click on Storage System Tab and then Topology tab
3.

Actual results:

The selected tab doesn't get highlighted as active tab.

Expected results:

The selected tab should have the active blue color.

Additional info:

https://github.com/openshift/console/pull/13023

Bug OCPBUGS-14716: Add Red Hat OpenShift Service on AWS branding option

View the Description View the linked PRs

Description of problem:

ROSA is being branded via custom branding; as a result, the favicon disappears since we do not want any Red Hat/Openshift-specific branding to appear when custom branding is in use.  Since ROSA is a Red Hat product, it should get a branding option added to the console so all the correct branding including favicon appears.

Version-Release number of selected component (if applicable):

4.14.0, 4.13.z, 4.12.z, 4.11.z

How reproducible:

Always

Steps to Reproduce:

1.  View a ROSA cluster
2.  Note the absence of the OpenShift logo favicon

Bug OCPBUGS-16219: NetworkManager fail to read static network configuration

View the Description View the linked PRs

Description of problem:

When using agent based installer to provision OCP on baremetal, some of the machine fail to use the static nmconnection files, and got ip address via DHCP.
This may cause the network vaildaiton fails.

Version-Release number of selected component (if applicable):

4.13.3

How reproducible:

100%

Steps to Reproduce:

1. Generate agent iso
2. Mount it to BMC and reboot from live cd
3. Use openshift-install agent wait for to monitor the progress

Actual results:

network vaildation fails due to overlay ip address

Expected results:

vaildation success

Additional info:

https://github.com/openshift/installer/pull/7355

Bug OCPBUGS-17297: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1173

Bug OCPBUGS-12662: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/212

Bug OCPBUGS-14384: The whereabouts-reconciler should not set an hard-coded node selector on the kubernetes.io/architecture label

View the Description View the linked PRs

Description of problem:

When deploying a whereabouts-IPAM-based additional network through the cluster-network-operator, the whereabouts-reconciler daemonset is not deployed on non-amd64 clusters due to an hard-coded nodeSelector introduced by https://github.com/openshift/cluster-network-operator/commit/be095d8c378e177d625a92aeca4e919ed0b5a14f

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

Always. Tested on a connected arm64 AWS cluster using the openshift-sdn network

Steps to Reproduce:

1. oc new-project test1
2. oc patch networks.operator.openshift.io/cluster -p '{"spec":{"additionalNetworks":[{"name":"tertiary-net2","namespace":"test1","rawCNIConfig":"{\n  \"cniVersion\": \"0.3.1\",\n  \"name\": \"test\",\n  \"type\": \"macvlan\",\n  \"master\": \"bond0.100\",\n  \"ipam\": {\n    \"type\": \"whereabouts\",\n    \"range\": \"10.10.10.0/24\"\n  }\n}","type":"Raw"}],"useMultiNetworkPolicy":true}}' --type=merge
3. oc get daemonsets -n openshift-multus

Actual results:

NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
whereabouts-reconciler          0         0         0       0            0           kubernetes.io/arch=amd64   7m27s

Expected results:

No kubernetes.io/arch=amd64 set, so that non-amd64 and multi-arch compute clusters can schedule the daemonset on each node, regardless of the architecture.

Additional info:

Same problem on s390x

https://github.com/openshift/cluster-network-operator/pull/1828

Bug MGMT-13746: Ignition image size should be validated on create/update InfraEnv

View the Description View the linked PRs

Description of the problem:

When creating/updating an InfraEnv, the size of compressed ignition should be validated.
I.e. the service should generate the entire ignition for each request, compress it (as done in ignition Archive), and ensure its size is up to 256KiB.

Notes:

The validation added by MGMT-13008 is performed directly on the `IgnitionConfigOverride` property. Thus, the validation isn't accurate as it should be done on the entire generated ignition config.
See full discussion here.
Related issue: MGMT-13643

How reproducible:

100%

Steps to reproduce:

1. Register an InfraEnv that would result with an ignition archive larger than 256KIB.
E.g. Invoke 'POST /v2/infra-envs' with large values in body (infra-env-create-params)

Actual results:

Register request succeed, but downloading the ISO fails.

Expected results:

The request should fail with an error message explaining the generated ignition archive is too large.

https://github.com/openshift/assisted-service/pull/5273

Bug MGMT-15368: Failing to scale down nodepool - machines stuck in Deleting phase

View the Description View the linked PRs

Description of the problem:

After creating successfully a hosted cluster using CAPI agent with 6 worker nodes (on two different subnets), I attempted to scale down the nodepool to 0 replicas.

2 agents returned to infraenv in "known-unbound" state, but the other 4 are still bound to the cluster., and their related machines CR are stuck in phase Deleting

$ oc get machines.cluster.x-k8s.io -n clusters-hosted-1
NAME                        CLUSTER          NODENAME            PROVIDERID                                     PHASE      AGE   VERSION
hosted-1-6655884866-dr4mv   hosted-1-vhc4f   hosted-rwn-1-1      agent://4cc93549-45cd-42a9-8c61-5d72b802ebe5   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-fkfjf   hosted-1-vhc4f   hosted-worker-1-0   agent://324afeeb-1af1-45d9-a2ba-f1101ffb6a6b   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-nzflz   hosted-1-vhc4f   hosted-rwn-1-2      agent://50b12199-7e95-4b3a-a5ce-d4aa0fa7909e   Deleting   94m   4.14.0-ec.3
hosted-1-6655884866-pc67l   hosted-1-vhc4f   hosted-worker-1-2   agent://284eb9e6-4375-4e59-9a11-a0a3131aa08b   Deleting   94m   4.14.0-ec.3

In the capi-provider pod logs I have the following:

time="2023-07-25T15:23:27Z" level=error msg="failed to add finalizer agentmachine.agent-install.openshift.io/deprovision to resource hosted-1-2ntnh clusters-hosted-1" func="github.com/openshift/cluster-api-provider-agent/controllers.(*AgentMachineReconciler).handleDeletionHook" file="/remote-source/app/controllers/agentmachine_controller.go:206" agent_machine=hosted-1-2ntnh agent_machine_namespace=clusters-hosted-1 error="Operation cannot be fulfilled on agentmachines.capi-provider.agent-install.openshift.io \"hosted-1-2ntnh\": StorageError: invalid object, Code: 4, Key: /kubernetes.io/capi-provider.agent-install.openshift.io/agentmachines/clusters-hosted-1/hosted-1-2ntnh, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 75febba6-8e98-4fca-861f-e83c467a3368, UID in object meta: "

and

time="2023-07-25T15:23:50Z" level=error msg="Failed to get agentMachine clusters-hosted-1/hosted-1-l4pp7" func="github.com/openshift/cluster-api-provider-agent/controllers.(*AgentMachineReconciler).Reconcile" file="/remote-source/app/controllers/agentmachine_controller.go:95" agent_machine=hosted-1-l4pp7 agent_machine_namespace=clusters-hosted-1 error="AgentMachine.capi-provider.agent-install.openshift.io \"hosted-1-l4pp7\" not found"

Actual results:

4 out of 6 agents are still bound to cluster

Expected results:

The nodepool is scaled to 0 replicas

https://github.com/openshift/hypershift/pull/2944

Bug OCPBUGS-10178: Update 4.14 operator-lifecycle-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/operator-framework-olm/pull/470

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/operator-framework-olm/pull/470

Bug OCPBUGS-11596: Users who can't list CatalogSources also can't initiate operator upgrades from the Subscription tab of the CSV details page

View the Description View the linked PRs

Description of problem:

We currently do some frontend logic to list and search CatalogSources for the source associated with the CSV and Subscription on the CSV details page. If we can't find the CatalogSource, we show an error message and prevent updates from the Subscription tab.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create an htpasswd idp with any user
2. Create a project admin role binding for this user
3. Install an operator in the namespace where the user has project admin permissions
4. Visit the CSV details page while logged in as the project admin user
5. View the subscriptions tab

Actual results:

An alert is shown indicating the the CatalogSource is missing, and the updates to the operator are prevented.

Expected results:

If the Subscription shows the catalog source as healthy in its status stanza, we shouldn't show an alert or prevent updates.

Additional info:

Reproducing this bug is dependent on the fix for OCPBUGS-3036 which prevents project admin users from viewing the Subscription tab at all.

https://github.com/openshift/console/pull/12717

Bug OCPBUGS-14321: Sysctl allowlist update test is unstable

View the Description View the linked PRs

Description of problem:

After updating the sysctl config map, the test waits up to 30s for the pod to be in ready state. From the logs, it could be seen that the allowlist controller takes more than 30s to reconcile when multiple tests are running in parallel.

The internal logic of the allowlist controller waits up to 60s for the pods of the allowlist DS to be running. Therefore, it is logical to increase the timeout in the test to 60s.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27955

Bug OCPBUGS-17255: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/195

Bug OCPBUGS-19790: OCP-57089 and OCP-24504 failed in 4.14 azure platform for the load-balancer service couldn't get an external-IP address

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18841~~. The following is the description of the original issue:
—
Description of problem:

Failed to run auto OCP-57089 on a 4.14 azure platform, manually checked it, the created load-balancer service couldn't get an external-IP address

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-09-164123

How reproducible:

100% on the cluster

Steps to Reproduce:

1. Add a wait in the auto script, then run the case
      g.By("check if the lb services have obtained the EXTERNAL-IPs")
      regExp := "([0-9]+.[0-9]+.[0-9]+.[0-9]+)"
      time.Sleep(3600 * time.Second) 
% ./bin/extended-platform-tests run all --dry-run | grep 57089 | ./bin/extended-platform-tests run -f -

2.
% oc get ns | grep e2e-test-router
e2e-test-router-ingressclass-n2z2c                 Active   2m51s 

3. It was pending in EXTERNAL-IP column for internal-lb-57089 service
% oc -n e2e-test-router-ingressclass-n2z2c get svc
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)           AGE
external-lb-57089   LoadBalancer   172.30.198.7    20.42.34.61   28443:30193/TCP   3m6s
internal-lb-57089   LoadBalancer   172.30.214.30   <pending>     29443:31507/TCP   3m6s
service-secure      ClusterIP      172.30.47.70    <none>        27443/TCP         3m13s
service-unsecure    ClusterIP      172.30.175.59   <none>        27017/TCP         3m13s
% 

4.
% oc -n e2e-test-router-ingressclass-n2z2c get svc internal-lb-57089 -oyaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/azure-load-balancer-internal: "true"
  creationTimestamp: "2023-09-12T07:56:42Z"
  finalizers:
  - service.kubernetes.io/load-balancer-cleanup
  name: internal-lb-57089
  namespace: e2e-test-router-ingressclass-n2z2c
  resourceVersion: "209376"
  uid: b163bc03-b1c6-4e7b-b4e1-c996e9d135f4
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.214.30
  clusterIPs:
  - 172.30.214.30
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: https
    nodePort: 31507
    port: 29443
    protocol: TCP
    targetPort: 8443
  selector:
    name: web-server-rc
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}
%

Actual results:

internal-lb-57089 service couldn't get an external-IP address

Expected results:

internal-lb-57089 service can get an external-IP address

Additional info:

Bug OCPBUGS-11100: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-11479: Error extracting libnmstate.so.1.3.3 when create image

View the Description View the linked PRs

Description of problem:

There is error when creating image:
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-06-060829

How reproducible:

always

Steps to Reproduce:

1. Prepare the agent-config.yaml and install-config.yaml files

2. Run 'bin/openshift-install agent create image --log-level debug'

3. There is following output with errors:
DEBUG extracting /usr/bin/agent-tui to /home/core/.cache/agent/files_cache, oc image extract --path /usr/bin/agent-tui:/home/core/.cache/agent/files_cache --confirm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c11d31d47db4afb03e4a4c8c40e7933981a2e3a7ef9805a1413c441f492b869b 
DEBUG Fetching image from OCP release (oc adm release info --image-for=agent-installer-node-agent --insecure=true registry.ci.openshift.org/ocp/release@sha256:83caa0a8f2633f6f724c4feb517576181d3f76b8b76438ff752204e8c7152bac) 
DEBUG extracting /usr/lib64/libnmstate.so.1.3.3 to /home/core/.cache/agent/files_cache, oc image extract --path /usr/lib64/libnmstate.so.1.3.3:/home/core/.cache/agent/files_cache --confirm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c11d31d47db4afb03e4a4c8c40e7933981a2e3a7ef9805a1413c441f492b869b 
DEBUG File /usr/lib64/libnmstate.so.1.3.3 was not found, err stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory 
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": stat /home/core/.cache/agent/files_cache/libnmstate.so.1.3.3: no such file or directory

Actual results:

The image generate fail

Expected results:

The image should generate success.

Additional info:

https://github.com/openshift/installer/pull/7075

Bug OCPBUGS-15531: [4.14] DaemonSet fails to scale down during the rolling update when maxUnavailable=0

View the Description View the linked PRs

Description of problem:

The OpenShift DNS daemonset has the rolling update strategy. The "maxSurge" parameter is set to a non zero value which means that the "maxUnavailable" parameter is set to zero. When the user replaces the toleration in the daemonset's template spec (via the OpenShift DNS config API) from the one which helps to be scheduled on the master node into any other toleration: the new pods are still trying to be scheduled on the master nodes. The old pods from the tolerated nodes can be lucky enough to be recreated but only if they go before any pod from the intolerable node.

The new pods are not expected to be scheduled on the nodes which are not tolerated by the new damonset's template spec. The daemonset controller should just delete the old pods from the nodes which cannot be tolerated anymore. The old pods from the nodes which can still be tolerated should be recreated according to the rolling update parameters.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:
1. Create the daemonset which tolerates "node-role.kubernetes.io/master" taint and has the following rolling update parameters:

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.updateStrategy
rollingUpdate:
  maxSurge: 10%
  maxUnavailable: 0
type: RollingUpdate

$ oc  -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: node-role.kubernetes.io/master
  operator: Exists

2. Let the daemonset to be scheduled on all the target nodes (e.g. all masters and all workers)

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          119m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-9cjdf     2/2     Running   0          2m35s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          119m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          2m12s   10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          119m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          112m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

3. Update the daemonset's tolerations by removing "node-role.kubernetes.io/master" and adding any other toleration (not existing works too):

$ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
- key: test-taint
  operator: Exists

Actual results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-6bfmf     2/2     Running   0          124m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
dns-default-76vjz     0/2     Pending   0          3m2s    <none>        <none>                                     <none>           <none>
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-c6j9x     2/2     Running   0          124m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-lx2nf     2/2     Running   0          124m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
dns-default-mmc78     2/2     Running   0          117m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Expected results:

$ oc -n openshift-dns get pods  -o wide | grep dns-default
dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
dns-default-mmc78     2/2     Running   0          7m54s   10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>

Additional info:
Upstream issue: https://github.com/kubernetes/kubernetes/issues/118823
Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1687455135950439

https://github.com/openshift/kubernetes/pull/1717

Bug OCPBUGS-7801: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12626

Bug OCPBUGS-11565: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1788

Bug OCPBUGS-12995: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1958

Bug OCPBUGS-16453: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/190

Bug OCPBUGS-17119: Improve Error Messages for Multiple Required SCC Annotations Failures

View the Description View the linked PRs

Description of problem:
Create two custom SCCs with different permissions, for example, custom-scc-1 with 'privileged' and custom-scc-2 with 'restricted'. Deploy a pod with annotations "openshift.io/required-scc: custom-scc-1, custom-scc-2". Pod deployment failed with error "Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found". The system fails to provide appropriate error messages for multiple required SCC annotations, leaving users unable to identify the cause of the failure effectively.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

Always

Steps to Reproduce:

$ oc login -u testuser-0
$ oc new-project scc-test
$ oc create sa scc-test -n scc-test
serviceaccount/scc-test created

$ oc get scc restricted-v2 -o yaml --context=admin > custom-restricted-v2-scc.yaml
$ sed -i -e 's/restricted-v2/custom-restricted-v2-scc/g' -e "s/MustRunAsRange/RunAsAny/" -e "s/priority: null/priority: 10/" custom-restricted-v2-scc.yaml

$ oc create -f custom-restricted-v2-scc.yaml --context=admin
securitycontextconstraints.security.openshift.io/custom-restricted-v2-scc created

$ oc adm policy add-scc-to-user custom-restricted-v2-scc system:serviceaccount:scc-test:scc-test --context=admin
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:custom-restricted-v2-scc added: "scc-test"

$ oc get scc privileged -o yaml --context=admin > custom-privileged-scc.yaml
$ sed -i -e 's/privileged/custom-privileged-scc/g' -e "s/priority: null/priority: 5/" custom-privileged-scc.yaml

$ oc create -f custom-privileged-scc.yaml --context=admin
securitycontextconstraints.security.openshift.io/custom-privileged-scc created

$ oc adm policy add-scc-to-user custom-privileged-scc system:serviceaccount:scc-test:scc-test --context=admin
clusterrole.rbac.authorization.k8s.io/system:openshift:scc:custom-privileged-scc added: "scc-test"


$ cat deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: test
spec:
  selector:
    matchLabels:
      deployment: test
  template:
    metadata:
      annotations:
        openshift.io/required-scc: custom-restricted-v2-scc, custom-privileged-scc
      labels:
        deployment: test
    spec:
      containers:
      - args:
        - infinity
        command:
        - sleep
        image: fedora:latest
        name: sleeper
      securityContext:
        runAsNonRoot: true
      serviceAccountName: scc-test


$ oc create -f deployment.yaml 
deployment.apps/test created

$ oc describe rs test-747555b669 | grep FailedCreate
  ReplicaFailure   True    FailedCreate
  Warning  FailedCreate  61s (x15 over 2m23s)  replicaset-controller  Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found

Actual results:

Pod deployment failed with "Error creating: pods "test-747555b669-" is forbidden: required scc/custom-restricted-v2-scc, custom-privileged-scc not found"

Expected results:

Either it should ignore the second scc instead of "not found"  or it should show a proper error message

Additional info:

https://github.com/openshift/kubernetes/pull/1661

Bug OCPBUGS-18308: ImageContentSourcePolicy in management cluster impacting KAS pull reference

View the Description View the linked PRs

Description of problem:

When the management cluster has ICSP resources, the pull reference of the Kube APIServer is replaced with a pull ref from the management cluster ICSPs resulting in a pull failure.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster with release registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-28-154013 on a management cluster that has ICSPs
2. Watch the kube-apiserver pods.

Actual results:

kube-apiserver pods are initially deployed with a pull ref from the release payload and they start, but then the deployment is updated with a pull ref from an ICSP mapping and the deployment fails to roll out.

Expected results:

kube-apiserver pods roll out successfully.

Additional info:

https://github.com/openshift/hypershift/pull/2966

Bug OCPBUGS-20103: [gcp] IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19376~~. The following is the description of the original issue:
—
Description of problem:

IPI installation using the service account attached to a GCP VM always fail with error "unable to parse credentials"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-15-233408

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit install-config.yaml to insert "credentialsMode: Manual"
3. "create manifests"
4. manually create the required credentials and copy the manifests to installation-dir/manifests directory
5. launch the bastion host along with binding to the pre-configured service account ipi-on-bastion-sa@openshift-qe.iam.gserviceaccount.com and scopes being "cloud-platform"
6. copy the installation-dir and openshift-install to the bastion host
7. try "create cluster" on the bastion host

Actual results:

The installation failed on "Creating infrastructure resources"

Expected results:

The installation should succeed.

Additional info:

(1) FYI the 4.12 epic: https://issues.redhat.com/browse/CORS-2260

(2) 4.12.34 doesn't have the issue (Flexy-install/234112/). 

(3) 4.13.13 doesn’t have the issue (Flexy-install/234126/).

(4) The 4.14 errors (Flexy-install/234113/):
09-19 16:13:44.919  level=info msg=Consuming Master Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Bootstrap Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Consuming Worker Ignition Config from target directory
09-19 16:13:44.919  level=info msg=Credentials loaded from gcloud CLI defaults
09-19 16:13:49.071  level=info msg=Creating infrastructure resources...
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=Error: unable to parse credentials
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=  with provider["openshift/local/google"],
09-19 16:13:50.950  level=error msg=  on main.tf line 10, in provider "google":
09-19 16:13:50.950  level=error msg=  10: provider "google" {
09-19 16:13:50.950  level=error
09-19 16:13:50.950  level=error msg=unexpected end of JSON input
09-19 16:13:50.950  level=error

https://github.com/openshift/installer/pull/7553

Bug OCPBUGS-21064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/80

Bug OCPBUGS-8035: [IBMCloud] fail to ssh to master/bootstrap/worker nodes from the bastion inside a customer vpc.

View the Description View the linked PRs

Description of problem:

install discnnect private cluster, ssh to master/bootstrap nodes from the bastion on the vpc failed.

Version-Release number of selected component (if applicable):

Pre-merge build https://github.com/openshift/installer/pull/6836
registry.build05.ci.openshift.org/ci-ln-5g4sj02/release:latest
Tag: 4.13.0-0.ci.test-2023-02-27-033047-ci-ln-5g4sj02-latest

How reproducible:

always

Steps to Reproduce:

1.Create bastion instance maxu-ibmj-p1-int-svc 
2.Create vpc on the bastion host 
3.Install private disconnect cluster on the bastion host with mirror registry 
4.ssh to the bastion  
5.ssh to the master/bootstrap nodes from the bastion

Actual results:

[core@maxu-ibmj-p1-int-svc ~]$ ssh -i ~/openshift-qe.pem core@10.241.0.5 -v
OpenSSH_8.8p1, OpenSSL 3.0.5 5 Jul 2022
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: configuration requests final Match pass
debug1: re-parsing configuration
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Reading configuration data /etc/ssh/ssh_config.d/50-redhat.conf
debug1: Reading configuration data /etc/crypto-policies/back-ends/openssh.config
debug1: Connecting to 10.241.0.5 [10.241.0.5] port 22.
debug1: connect to address 10.241.0.5 port 22: Connection timed out
ssh: connect to host 10.241.0.5 port 22: Connection timed out

Expected results:

ssh succeed.

Additional info:

$ibmcloud is sg-rules r014-5a6c16f4-8a4c-4c02-ab2d-626c14f72a77 --vpc maxu-ibmj-p1-vpc
Listing rules of security group r014-5a6c16f4-8a4c-4c02-ab2d-626c14f72a77 under account OpenShift-QE as user ServiceId-dff277a9-b608-410a-ad24-c544e59e3778...
ID                                          Direction   IP version   Protocol                      Remote   
r014-6739d68f-6827-41f4-b51a-5da742c353b2   outbound    ipv4         all                           0.0.0.0/0   
r014-06d44c15-d3fd-4a14-96c4-13e96aa6769c   inbound     ipv4         all                           shakiness-perfectly-rundown-take   r014-25b86956-5370-4925-adaf-89dfca9fb44b   inbound     ipv4         tcp Ports:Min=22,Max=22       0.0.0.0/0   
r014-e18f0f5e-c4e5-44a5-b180-7a84aa59fa97   inbound     ipv4         tcp Ports:Min=3128,Max=3129   0.0.0.0/0   
r014-7e79c4b7-d0bb-4fab-9f5d-d03f6b427d89   inbound     ipv4         icmp Type=8,Code=0            0.0.0.0/0   
r014-03f23b04-c67a-463d-9754-895b8e474e75   inbound     ipv4         tcp Ports:Min=5000,Max=5000   0.0.0.0/0   
r014-8febe8c8-c937-42b6-b352-8ae471749321   inbound     ipv4         tcp Ports:Min=6001,Max=6002   0.0.0.0/0

https://github.com/openshift/installer/pull/6944

Bug OCPBUGS-12494: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/92

Bug OCPBUGS-14550: Openshift Console does not use Proxy consistenty

View the Description View the linked PRs

Description of problem:

Openshift Console fails to render Monitoring Dashboard when there is a Proxy expected to be used. Additionally, Websocket connections fail due to not using Proxy.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Connect to a cluster using backplane and use one of IT's proxies
2. Execute "ocm backplane console -b"
3. Attempt to view the monitoring dashbaord

Actual results:

Monitoring dashboard fails to load with an EOF error
Terminal is spammed with EOF errors

Expected results:

Monitoring dashboard should be rendered correctly
Terminal should not be spammed with error logs

Additional info:

When we apply changes as this PR, the monitoring dashboard works with proxy https://github.com/openshift/console/pull/12877

https://github.com/openshift/console/pull/12877

Bug OCPBUGS-22177: Channel page shows "Required" message for the default name when navigate to create channel page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19783~~. The following is the description of the original issue:
—
Description of problem:

When navigating to create Channel page from add or topology, the default name as "channel" is present but still the Create button is disabled with "Required" showing under the name field

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-09-26-042251

How reproducible:

Always

Steps to Reproduce:

1. Install serverless operator
2. Go to Add page in developer perspective
3. Click on the channel card

Actual results:

The create button is disabled with an error showing "Required" under the name field but the name field contains the default name as "channel"

Expected results:

The create button should be active

Additional info:

If you switch to yaml view the create button becomes active and if you switch back to form view the create button is still active

https://github.com/openshift/console/pull/13262

Bug OCPBUGS-23069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5676

Bug OCPBUGS-23505: eventlet dependency breaks python-dns in RHEL 9.3 rebase

View the Description View the linked PRs

RHEL 9.3 broke at least ironic when it rebased python-dns to 2.3.0

dnspython 2.3.0 raised AttributeError: module 'dns.rdtypes' has no attribute 'ANY' https://github.com/eventlet/eventlet/issues/781

https://github.com/openshift/ironic-image/pull/426

Bug OCPBUGS-8421: Hypershift API: documentation is wrong about audit webhook field

View the Description View the linked PRs

Description of problem:

API documentation for HostedCluster states that the webhook kubeconfig field is only supported for IBM Cloud. It should be supported for all platforms.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Review API documentation at https://hypershift-docs.netlify.app/reference/api/

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/2258

Bug OCPBUGS-10207: When releaseImage is a digest the create image command generates spurious warning

View the Description View the linked PRs

Description of problem:

When the releaseImage is a digest, for example quay.io/openshift-release-dev/ocp-release@sha256:bbf1f27e5942a2f7a0f298606029d10600ba0462a09ab654f006ce14d314cb2c, a spurious warning is putput when running
openshift-install agent create image

Its not calculating the releaseImage properly (see the '@sha' suffix below) so it causes this spurious message
WARNING The ImageContentSources configuration in install-config.yaml should have at-least one source field matching the releaseImage value quay.io/openshift-release-dev/ocp-release@sha256 

This can cause confusion for users.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Every time when using a release image with a digest is used

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6971

Bug OCPBUGS-14812: Update OWNERS and OWNERS_ALIASES in external-resizer repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-resizer/pull/142

Bug OCPBUGS-9404: IPI Azure internal (User Defined Routing) clusters create purposeless standard load balancer

View the Description View the linked PRs

Version:

$ openshift-install version
./openshift-install 4.11.0-0.nightly-2022-07-13-131410
built from commit cdb9627de7efb43ad7af53e7804ddd3434b0dc58
release image registry.ci.openshift.org/ocp/release@sha256:c5413c0fdd0335e5b4063f19133328fee532cacbce74105711070398134bb433
release architecture amd64

Platform:

Azure IPI

What happened?
When one creates an IPI Azure cluster with an `internal` publishing method, it creates a standard load balancer with an empty definition. This load balancer doesn't serve a purpose as far as I can tell since the configuration is completely empty. Because it doesn't have a public IP address and backend pools it's not providing any outbound connectivity, and there are no frontend IP configurations for ingress connectivity to the cluster.

Below is the ARM template that is deployed by the installer (through terraform)

```
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"loadBalancers_mgahagan411_7p82n_name":

{ "defaultValue": "mgahagan411-7p82n", "type": "String" }

},
"variables": {},
"resources": [
{
"type": "Microsoft.Network/loadBalancers",
"apiVersion": "2020-11-01",
"name": "[parameters('loadBalancers_mgahagan411_7p82n_name')]",
"location": "northcentralus",
"sku":

{ "name": "Standard", "tier": "Regional" }

,
"properties":

{ "frontendIPConfigurations": [], "backendAddressPools": [], "loadBalancingRules": [], "probes": [], "inboundNatRules": [], "outboundRules": [], "inboundNatPools": [] }

}
]
}
```

What did you expect to happen?

Don't create the standard load balancer on an internal Azure IPI cluster (as it appears to serve no purpose)

How to reproduce it (as minimally and precisely as possible)?
1. Create an IPI cluster with the `publish` installation config set to `Internal` and the `outboundType` set to `UserDefinedRouting`.
```
apiVersion: v1
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform:
azure: {}
replicas: 3
compute:

architecture: amd64
hyperthreading: Enabled
name: worker
platform:
azure: {}
replicas: 3
metadata:
name: mgahaganpvt
platform:
azure:
region: northcentralus
baseDomainResourceGroupName: os4-common
outboundType: UserDefinedRouting
networkResourceGroupName: mgahaganpvt-rg
virtualNetwork: mgahaganpvt-vnet
controlPlaneSubnet: mgahaganpvt-master-subnet
computeSubnet: mgahaganpvt-worker-subnet
pullSecret: HIDDEN
networking:
clusterNetwork:
cidr: 10.128.0.0/14
hostPrefix: 23
serviceNetwork:
172.30.0.0/16
machineNetwork:
cidr: 10.0.0.0/16
networkType: OpenShiftSDN
publish: Internal
proxy:
httpProxy: http://proxy-user1:password@10.0.0.0:3128
httpsProxy: http://proxy-user1:password@10.0.0.0:3128
baseDomain: qe.azure.devcluster.openshift.com
```

2. Show the json content of the standard load balancer is completely empty
`az network lb show -g myResourceGroup -n myLbName`

```
{
"name": "mgahagan411-7p82n",
"id": "/subscriptions/00000000-0000-0000-00000000/resourceGroups/mgahagan411-7p82n-rg/providers/Microsoft.Network/loadBalancers/mgahagan411-7p82n",
"etag": "W/\"40468fd2-e56b-4429-b582-6852348b6a15\"",
"type": "Microsoft.Network/loadBalancers",
"location": "northcentralus",
"tags": {},
"properties":

{ "provisioningState": "Succeeded", "resourceGuid": "6fb11ec9-d89f-4c05-b201-a61ea8ed55fe", "frontendIPConfigurations": [], "backendAddressPools": [], "loadBalancingRules": [], "probes": [], "inboundNatRules": [], "inboundNatPools": [] }

,
"sku":

{ "name": "Standard" }

}
```

https://github.com/openshift/installer/pull/7063

Story OSASINFRA-2168: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7015

Bug OCPBUGS-10235: [OVN] baremetal 30-static-dhcpv6 shell quoting error: binary operator expected

View the Description View the linked PRs

Description of problem:

I found an old shell error while checking logs. We don't quote a variable with [ -z ].

    if [ -z $DHCP6_IP6_ADDRESS ]
    then
        >&2 echo "Not a DHCP6 address. Ignoring."
        exit 0
    fi

https://github.com/openshift/machine-config-operator/blob/master/templates/common/baremetal/files/NetworkManager-static-dhcpv6.yaml#L8


Dec 05 12:05:02 master-0-2 nm-dispatcher[1365]: time="2022-12-05T12:05:02Z" level=debug msg="Ignoring filtered route {Ifindex: 10 Dst: fd2e:6f44:5dd8::59/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
Dec 05 12:05:02 master-0-2 nm-dispatcher[1365]: time="2022-12-05T12:05:02Z" level=debug msg="Ignoring filtered route {Ifindex: 10 Dst: fd2e:6f44:5dd8::5a/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"

Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: req:19 'up' [br-ex], "/etc/NetworkManager/dispatcher.d/30-static-dhcpv6": run script
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + '[' -z fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59 ']'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: /etc/NetworkManager/dispatcher.d/30-static-dhcpv6: line 4: [: fd2e:6f44:5dd8::5a: binary operator expected
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ ip -j -6 a show br-ex
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ jq -r '.[].addr_info[] | select(.scope=="global") | select(.deprecated!=true) | select(.local=="fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59") | .preferred_life_time'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + LEASE_TIME=
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ ip -j -6 a show br-ex
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: ++ jq -r '.[].addr_info[] | select(.scope=="global") | select(.deprecated!=true) | select(.local=="fd2e:6f44:5dd8::5a fd2e:6f44:5dd8::59") | .prefixlen'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + PREFIX_LEN=
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + '[' 0 -lt 4294967295 ']'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + echo 'Not an infinite DHCP6 lease. Ignoring.'
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: Not an infinite DHCP6 lease. Ignoring.
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: + exit 0
Dec 05 12:05:27 master-0-2 nm-dispatcher[1365]: req:19 'up' [

Version-Release number of selected component (if applicable):

4.10.0-0.nightly-2022-11-30-111136

How reproducible:

Twice

Steps to Reproduce:

1. Somehow DHCPv6 provides two IPv6 leases
2. NetworkManager sets $DHCP6_IP6_ADDRESS to be all IPv6 address with spaces in-between
3. Bash error

Actual results:


/etc/NetworkManager/dispatcher.d/30-static-dhcpv6: line 4: [: fd2e:6f44:5dd8::5a: binary operator expected

Expected results:

shell inputs are sanitized or properly quoted.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3679

Bug OCPBUGS-13955: Cannot override base image selection when creating agent ISO

View the Description View the linked PRs

Description of problem:

It's not currently possible to override the base image selected by the command:

$ openshift-install agent create image

Also defining the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE variable does not have any effect

Version-Release number of selected component (if applicable):

4.14

How reproducible:

By defining the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE when creating the image

Steps to Reproduce:

1. $ OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE=<valid url to rhcos image> 
2. $ openshift-install agent create image
3.

Actual results:

The agent ISO is built by using the embedded rhcos.json metadata, instead of the rhcos image specified in the OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE

Expected results:

Defining OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE should allow overriding the base image selected for creating the agent ISO

Additional info:

https://github.com/openshift/installer/pull/7211

Bug OCPBUGS-15238: OpenShift Installer gets stuck while listing GCP projects

View the Description View the linked PRs

Description of problem:

While trying to deploy OCP on GCP the Installer get stuck on the very first step trying to list all the projects the GCP service account used to deploy OCP can list

Version-Release number of selected component (if applicable):

4.13.3 but also happening on 4.12.5 and I presume other releases as well

How reproducible:

Every time

Steps to Reproduce:

1. Use openshift-install to create a cluster in GCP

Actual results:

$ ./openshift-install-4.13.3 create cluster --dir gcp-doha/ --log-level debug
DEBUG OpenShift Installer 4.13.3                   
DEBUG Built from commit 90bb61f38881d07ce94368f0b34089d152ffa4ef 
DEBUG Fetching Metadata...                         
DEBUG Loading Metadata...                          
DEBUG   Loading Cluster ID...                      
DEBUG     Loading Install Config...                
DEBUG       Loading SSH Key...                     
DEBUG       Loading Base Domain...                 
DEBUG         Loading Platform...                  
DEBUG       Loading Cluster Name...                
DEBUG         Loading Base Domain...               
DEBUG         Loading Platform...                  
DEBUG       Loading Networking...                  
DEBUG         Loading Platform...                  
DEBUG       Loading Pull Secret...                 
DEBUG       Loading Platform...                    
INFO Credentials loaded from environment variable "GOOGLE_CREDENTIALS", file "/home/mak/.gcp/aos-serviceaccount.json"
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: platform.gcp.project: Internal error: context deadline exceeded

Expected results:

The cluster should be deployed with no issues

Additional info:

The GCP user used to deploy OCP has visibility of thousands of projects:

> gcloud projects list | wc -l
  152793

Bug OCPBUGS-17680: hypershift 4.13 prow jobs failed because of spec.pullSecret is immutable now

View the Description View the linked PRs

Description of problem:

# QE prow CI job update hostedcluster.spec.pullSecret for some qe catalog source configurations. 4.13 jobs failed with error msg:

Error from server (HostedCluster.spec.pullSecret.name: Invalid value: "9509a26c339de31aa3c9-pull-secret-new": Attempted to change an immutable field): admission webhook "hostedclusters.hypershift.openshift.io" denied the request: HostedCluster.spec.pullSecret.name: Invalid value: "9509a26c339de31aa3c9-pull-secret-new": Attempted to change an immutable field

Version-Release number of selected component (if applicable):

4.13

How reproducible:

4.13 job:
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/41339/rehearse-41339-periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-aws-ipi-ovn-hypershift-guest-p1-f7/1689831180221812736

Steps to Reproduce:

see the above job

Actual results:

job failed to config pull secret for hostedcluster

Expected results:

job could run successfully

Additional info:

1. The 4.14  hypershift QE CI jobs were successfully executed with the same codes.
2. I can update 4.13 hostedcluster spec.pullSecret in my local hypershift env.

It seems to be caused by some limitation only in prow?

slack thread: https://redhat-internal.slack.com/archives/C01C8502FMM/p1691736890938529

https://github.com/openshift/hypershift/pull/2910

Bug OCPBUGS-9909: Could not import multiple resources via JSON (while YAML supports this)

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):
All versions?
At least on 4.12+

How reproducible:
Always

Steps to Reproduce:

Open the console and click on the + sign in the top right navigation header.

This JSON works fine:

{
  "apiVersion": "v1",
  "kind": "ConfigMap",
  "metadata": {
    "generateName": "a-configmap-"
  }
}

But neither an array could be used to import multiple resources:

[
  {
    "apiVersion": "v1",
    "kind": "ConfigMap",
    "metadata": {
      "generateName": "a-configmap-"
    }
  },
  {
    "apiVersion": "v1",
    "kind": "ConfigMap",
    "metadata": {
      "generateName": "a-configmap-"
    }
  }
]

Fails with error: No "apiVersion" field found in YAML.

Nor a Kubernetes List "resource" could be used:

{
  "apiVersion": "v1",
  "kind": "List",
  "items": [
    {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "generateName": "a-configmap-"
      }
    },
    {
      "apiVersion": "v1",
      "kind": "ConfigMap",
      "metadata": {
        "generateName": "a-configmap-"
      }
    }
  ]
}

Fails with error: The server doesn't have a resource type "kind: List, apiVersion: v1".

Actual results:
Both JSON structures could not be imported.

Expected results:
Both JSON structures works fine and create multiple resources.

If the JSON array contains just one item the resource detail page should be opened, otherwise the import result page similar to when the user imports a yaml with multiple resources.

Additional info:
Found this JSON structure for example in issue ~~OCPBUGS-4646~~

https://github.com/openshift/console/pull/12721

Bug OCPBUGS-12572: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/113

Bug OCPBUGS-12739: IPv6 ingress VIP not configured in keepalived on vSphere Dual-stack

View the Description View the linked PRs

Description of problem:

The IPv6 VIP does not seem to be present in the keepalived.conf.

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  machineNetwork:
  - cidr: 192.168.110.0/23
  - cidr: fd65:a1a8:60ad::/112
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112
platform:
  vsphere:
    apiVIPs:
    - 192.168.110.116
    - fd65:a1a8:60ad:271c::1116
    ingressVIPs:
    - 192.168.110.117
    - fd65:a1a8:60ad:271c::1117
    vcenters:
    - datacenters:
      - IBMCloud
      server: ibmvcenter.vmc-ci.devcluster.openshift.com

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-21-084440

How reproducible:

Frequently.
2 failures out of 3 attemps.

Steps to Reproduce:

1. Install vSphere dual-stack with dual VIPs, see above config
2. Check keepalived.conf
for f in $(oc get pods -n openshift-vsphere-infra -l app=vsphere-infra-vrrp --no-headers -o custom-columns=N:.metadata.name  ) ; do oc -n openshift-vsphere-infra exec -c keepalived $f -- cat /etc/keepalived/keepalived.conf | tee $f-keepalived.conf ; done

Actual results:

IPv6 VIP is not in keepalived.conf

Expected results:

vrrp_instance rbrattai_INGRESS_1 {
    state BACKUP
    interface br-ex
    virtual_router_id 129
    priority 20
    advert_int 1

    unicast_src_ip fd65:a1a8:60ad:271c::cc
    unicast_peer {
        fd65:a1a8:60ad:271c:9af:16a9:cb4f:d75c
        fd65:a1a8:60ad:271c:86ec:8104:1bc2:ab12
        fd65:a1a8:60ad:271c:5f93:c9cf:95f:9a6d
        fd65:a1a8:60ad:271c:bb4:de9e:6d58:89e7
        fd65:a1a8:60ad:271c:3072:2921:890:9263
    }
...
    virtual_ipaddress {
        fd65:a1a8:60ad:271c::1117/128
    }
...
}

Additional info:

See ~~OPNET-207~~

Bug OCPBUGS-15754: Bump Jenkins and Jenkins Agent Base image versions

View the Description View the linked PRs

Description of problem:

Jenkins and Jenkins Agent Base image versions needs to be updated to use the latest images to mitigate known CVEs in plugins and Jenkins versions.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/504

Bug OCPBUGS-21822: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3105

Bug OCPBUGS-17446: Wrong advertise address is used in hosted control plane etcd

View the Description View the linked PRs

Description of problem:

The advertise address configured for our hcp etcd clusters is not resolvable via DNS (ie. etcd-0.etcd-client.namespace.svc:2379). This impacts some of the etcd tooling that expects to access each member by their advertise address.

Version-Release number of selected component (if applicable):

4.14 (and earlier)

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster and wait for it to come up.
2. Exec into an etcd pod and query cluster endpoint health:
   $ oc rsh etcd-0
   $ etcdctl --cacert /etc/etcd/tls/etcd-ca/ca.crt \
             --cert /etc/etcd/tls/server/server.crt \
             --key /etc/etcd/tls/server/server.key \
             --endpoints https://localhost:2379 \
             endpoint health --cluster -w table

Actual results:

An error is returned similar to:
{"level":"warn","ts":"2023-08-07T20:40:49.890254Z","logger":"client","caller":"v3@v3.5.9/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000378fc0/etcd-0.etcd-client.clusters-test-cluster.svc:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp: lookup etcd-0.etcd-client.clusters-test-cluster.svc on 172.30.0.10:53: no such host\""}

Expected results:

Actual cluster health is returned:
+--------------------------------------------------------------+--------+-------------+-------+
|                           ENDPOINT                           | HEALTH |    TOOK     | ERROR |
+--------------------------------------------------------------+--------+-------------+-------+
| https://etcd-0.etcd-discovery.clusters-cewong-guest.svc:2379 |   true |  9.372168ms |       |
| https://etcd-2.etcd-discovery.clusters-cewong-guest.svc:2379 |   true | 12.269226ms |       |
| https://etcd-1.etcd-discovery.clusters-cewong-guest.svc:2379 |   true | 12.291392ms |       |
+--------------------------------------------------------------+--------+-------------+-------+

Additional info:

The etcd statefulset is created with spec.serviceName set to `etcd-discovery`. This means that pods in the statefulset, get subdomain set to `etcd-discovery` and names like etcd-0.etcd-discovery.[ns].svc are resolvable. However, the same is not true for the etcd-client service. etcd-0.etcd-client.[ns].svc is not resolvable. The fix would be to change the advertise address of each member to a resolvable name (ie. etcd-0.etcd-discvoery.[ns].svc) and adjust the server certificate to allow those names as well.

https://github.com/openshift/hypershift/pull/2884

Bug OCPBUGS-18013: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/230

Bug OCPBUGS-18686: useLabelsModal is not properly exported from dynamic plugin SDK

View the Description View the linked PRs

Description of problem:

We need to export the hook function from the module that's required in the dynamic core api, otherwise an exception will be thrown if the hook is imported/used by plugins.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Plugins using this hook throw an exception.

Expected results:

The hook should be imported and function properly.

Additional info:

https://github.com/openshift/console/pull/13142

Bug OCPBUGS-21189: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/295

Bug OCPBUGS-21571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/88

Bug MGMT-14108: PSI installations are producing readiness warning events

View the Description View the linked PRs

Description of the problem:

When invoking installation with assisted-service scripts (make deploy-all), as being done in installation for PSI env, the pods for assisted-service and assisted-image-service produce warning about readiness-probe validation that is failing:

Readiness probe failed: Get "http://172.28.8.39:8090/ready": dial tcp 172.28.8.39:8090: connect: connection refused

Those warnings are harmless, but they make people think that there is a problem with the running pods (or that they are not ready yet, even though the pods are marked as ready).

How reproducible:

100%

Steps to reproduce:

1. invoke make deploy-all on PSI or other places (for some reason it doesn't reproduce on minikube)

2. inspect the pod's conditions part with oc describe, and look for warnings

Actual results:

Warnings emitted

Expected results:
No warnings should be emitted for the initial setup time of each pod. The fix just requires setting initialDelaySeconds in the readinessProbe configuration, just like we did in the template: https://github.com/openshift/assisted-service/pull/4557
see also: https://github.com/openshift/assisted-service/pull/380#pullrequestreview-490308765

https://github.com/openshift/assisted-service/pull/5150

Bug OCPBUGS-11324: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/121

Bug OCPBUGS-17171: Operator catalogs from 4.12 are used in 4.13 and 4.14 hosted clusters

View the Description View the linked PRs

Description of problem:

The operator catalog images used in 4.13 hosted clusters are the ones from 4.12

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Create a 4.13 HostedCluster
2. Inspect the image tags used for catalog imagestreams (oc get imagestreams -n CONTROL_PLANE_NAMESPACE)

Actual results:

image tags point to 4.12 catalog images

Expected results:

image tags point to 4.13 catalog images

Additional info:

These image tags need to be updated: https://github.com/openshift/hypershift/blob/release-4.13/control-plane-operator/controllers/hostedcontrolplane/olm/catalogs.go#L117-L120

https://github.com/openshift/hypershift/pull/2877

Bug OCPBUGS-25997: [release-4.14] When a receiver is created for alert notification through web console uses match instead of matchers

View the Description View the linked PRs

Description of problem:

Alert notification receiver created through web console creates receiver with field match which is deprecated instead of matchers and when match is changed to matchers causes Alertmanager pods to crashloopbackoff state throwing the error:
~~~
ts=2023-11-14T08:42:39.694Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="yaml: unmarshal errors:\n  line 51: cannot unmarshal !!map into []string"
~~~

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create alert notification receiver through web console.
Administration-->configuration-->Alertmanager-->create receiver-->add receiver

2. Check the yaml created which would contain route section with match and not matchers.

3. correct the match to matchers and not change the matchers defined like severity or alertname correctly  .

4. Restart the Alertmanager pods which leads to crashloopbackoff state.

Actual results:

Alert notification receiver uses match field

Expected results:

Alert notification receiver should use matchers filed

Additional info:

https://github.com/openshift/console/pull/13478

Bug OCPBUGS-11057: Importing a kn Service shows a non-working Open URL decorator also when the Add Route checkbox was unselected

View the Description View the linked PRs

Description of problem:
When import a Serverless Service from a git repository the topology shows an Open URL decorator also when "Add Route" checkbox was unselected (which is selected by default).

The created kn Route makes the Service available within the cluster and the created URL looks like this: http://nodeinfo-private.serverless-test.svc.cluster.local

So the Service is NOT accidentally exposed. It's "just" that we link an internal route that will not be accessible to the user.

This might happen also for Serverless functions import flow and the import container image import flow.

Version-Release number of selected component (if applicable):
Tested older versions and could see this at least on 4.10+

How reproducible:
Always

Steps to Reproduce:

Install the OpenShift Serverless operator and create the required kn Serving resource.
Navigate to the Developer perspective > Add > Import from Git
Enter a git repository (like https://gitlab.com/jerolimov/nodeinfo
Unselect "Add Route" and press Create

Actual results:
The topology shows the new kn Service with a Open URL decorator on the top right corner.

The button is clickable but the target page could not be opened (as expected).

Expected results:
The topology should not show an Open URL decorator for "private" kn Routes.

The topology sidebar shows similar information, we should maybe release the Link there as well with a Text+Copy button???

A fix should be tested as well with Serverless functions as container images!

Additional info:
When the user unselects the "Add route" option an additional label is added to the kn Service. This label could also be added and removed later. When this label is specified the Open URL decorator should not be shown:

metadata:
  labels:
    networking.knative.dev/visibility: cluster-local

Description of problem

oc adm node-logs feature has been upstreamed and is part of k8s 1.27. This resulted in the addition kubelet configuration enableSystemLogQuery to enable the feature. This feature has been enabled in the base kubelet configs in MCO. However in situations where TechPreview is enabled, it causes MCO to generate a kubelet configuration that overwrites the default and when it does this, the unmarshal and marshal cycle drops the field it is not aware of. This is because MCO currently vendors in k8s.io/kubelet at v0.25.1 and can be fixed by vendoring in v0.27.1

How reproducible:always

Steps to Reproduce:

1. Bring up a 4.14 cluster with TechPreview enabled
2. Run oc adm node-logs
3.

Actual results:

Command returns "<a href="ec274df5b608cc7a149ece1ce673306c/">ec274df5b608cc7a149ece1ce673306c/</a>" which is the contents of /var/log/journal

Expected results:

Should return journal logs from the node

Additional info

I took a quick cut of updating the OpenShift and k8s APIs to 1.27. Running into the following during make verify:

cmd/machine-config-controller/start.go:18:2: could not import github.com/openshift/machine-config-operator/pkg/controller/template (-: # github.com/openshift/machine-config-operator/pkg/controller/template
pkg/controller/template/render.go:396:91: cannot use cfg.FeatureGate (variable of type *"github.com/openshift/api/config/v1".FeatureGate) as featuregates.FeatureGateAccess value in argument to cloudprovider.IsCloudProviderExternal: *"github.com/openshift/api/config/v1".FeatureGate does not implement featuregates.FeatureGateAccess (missing method AreInitialFeatureGatesObserved)
pkg/controller/template/render.go:441:90: cannot use cfg.FeatureGate (variable of type *"github.com/openshift/api/config/v1".FeatureGate) as featuregates.FeatureGateAccess value in argument to cloudprovider.IsCloudProviderExternal: *"github.com/openshift/api/config/v1".FeatureGate does not implement featuregates.FeatureGateAccess (missing method AreInitialFeatureGatesObserved)) (typecheck)
        "github.com/openshift/machine-config-operator/pkg/controller/template"
        ^

Here are some examples of how other operators have handled this.

This is a critical bug as oc adm node-logs runs as part of must-gather and debugging node issues with TechPreview jobs in CI is impossible without this working.

https://github.com/openshift/machine-config-operator/pull/3735

Bug OCPBUGS-26323: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/163

Feature Request RFE-4144: Support new Azure LoadBalancer 100min idle TCP timeout

View the Description View the linked PRs

1. Proposed title of this feature request
Support new Azure LoadBalancer 100min idle TCP timeout

2. What is the nature and description of the request ?
When provisioning a service of type LoadBalancer for OCP cluster on Azure, it is possible to customize TCP idle timeouts in minutes using the LoadBalancer annotations 'service.beta.kubernetes.io/azure-load-balancer-tcp-idle-timeout'

Currently, min and max values are hardcoded to respectively 4 an 30 in both legacy Azure Cloud Provider implementation and cloud Provider Azure

Recently Azure upgraded its implementation to support a max of 100 min for idle timeout, corresponding documentation should be updated soon Configure TCP reset and idle timeout for Azure Load Balancer. It is now possible to use such idle timeout with more than 30min manually in Azure portal or with Azure cli but not possible from K8s load balancer as max value is still 30min in K8s code.
Error message returned is

`Warning  SyncLoadBalancerFailed  2s (x3 over 18s)    service-controller  Error syncing load balancer: failed to ensure load balancer: idle timeout value must be a whole number representing minutes between 4 and 30`

3. Why does the customer need this? (List the business requirements here)
Customer is migrating workloads from on premise datacenter to Azure. Using idle timeout with more than 30min is critical to migrate some of our customer links to Azure and is preventing the migration until this is supported by Openshift

4. List any affected packages or components.
Azure cloud controler

https://github.com/openshift/cloud-provider-azure/pull/80

Bug OCPBUGS-20762: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/986

Bug OCPBUGS-7410: Cleanup of active VPC Endpoint Services

View the Description View the linked PRs

Description of problem:

If the HyperShift operator is installed onto a cluster, it creates VPC Endpoint Services fronting the hosted Kubernetes API Server for downstream HyperShift clusters to connect to. These VPC Endpoint Services are tagged such that the uninstaller would attempt to action them:

"kubernetes.io/cluster/${ID}: owned"

However they cannot be deleted until all active VPC Endpoint Connections are rejected - the uninstaller should be able to do this.

Version-Release number of selected component (if applicable):

4.12 (but shouldn't be version-specific)

How reproducible:

100%

Steps to Reproduce:

1. Create an NLB + VPC Endpoint Service in the same VPC as a cluster
2. Tag it accordingly and create a VPC Endpoint connection to it

Actual results:

The uninstaller will not be able to delete the VPC Endpoint Service + the NLB that the VPC Endpoint Service is fronting

Expected results:

The VPC Endpoint Service can be completely cleaned up, which would allow the NLB to be cleaned up

Additional info:

https://github.com/openshift/installer/pull/7101

Bug OCPBUGS-17298: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/88

Bug OCPBUGS-17504: Dev console: Alert details page's "Silenced By" list should not have checkboxes

View the Description View the linked PRs

Same issue as https://issues.redhat.com/browse/OU-230, but for the Developer console.

https://github.com/openshift/console/pull/13085

Bug OU-117: No response for duplicate query with default disabled status when click 'Hide all queries'

View the Description View the linked PRs

https://github.com/openshift/console/pull/12621

Bug OCPBUGS-10176: Update 4.14 openshift-enterprise-keepalived-ipfailover image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/132

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/132

Bug OCPBUGS-13895: route-controller-manager configuration changes for a refactoring

View the Description View the linked PRs

Description of problem:

The following changes are required for openshift/route-controller-manager#22 refactoring.

add POD_NAME to route-controller-manager deployment
introduce route-controller-defaultconfig and customize lease name openshift-route-controllers to override the default one supplied by library-go
add RBAC for infrastructures which is used by library-go for configuring leader election

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/288

Bug OCPBUGS-1829: [Openshift Pipelines] Link to Openshift Route from service is breaking because of hardcoded value of targetPort

View the Description View the linked PRs

Description of problem:

Link to Openshift Route from service is breaking because of hardcoded value of targetPort. If the targetPort gets changed, the route still points to the older value of port as it's hardcoded

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Install the latest available version of Openshift Pipelines
2. Create the pipeline and triggerbinding using the attached files
3. Add trigger to the created pipeline from devconsole UI, select the above created triggerbinding while adding trigger
4. Trigger an event using the curl command curl -X POST -d '{ "url": "https://www.github.com/VeereshAradhya/cli" }' -H 'Content-Type: application/json' <route> and make sure that the pipelinerun gets started
5. Update the tagetPort in the svc from 8080 to 8000
6. Again use the above curl command to trigger one more event

Actual results:

The curl command throws error

Expected results:

The curl command should be successful and the pipelinerun should get started successfully

Additional info:

Error:
curl -X POST -d '{ "url": "https://www.github.com/VeereshAradhya/cli" }' -H 'Content-Type: application/json' http://el-event-listener-3o9zcv-test-devconsole.apps.ve412psi.psi.ospqa.com
<html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1">    <style type="text/css">
      body {
        font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
        line-height: 1.66666667;
        font-size: 16px;
        color: #333;
        background-color: #fff;
        margin: 2em 1em;
      }
      h1 {
        font-size: 28px;
        font-weight: 400;
      }
      p {
        margin: 0 0 10px;
      }
      .alert.alert-info {
        background-color: #F0F0F0;
        margin-top: 30px;
        padding: 30px;
      }
      .alert p {
        padding-left: 35px;
      }
      ul {
        padding-left: 51px;
        position: relative;
      }
      li {
        font-size: 14px;
        margin-bottom: 1em;
      }
      p.info {
        position: relative;
        font-size: 20px;
      }
      p.info:before, p.info:after {
        content: "";
        left: 0;
        position: absolute;
        top: 0;
      }
      p.info:before {
        background: #0066CC;
        border-radius: 16px;
        color: #fff;
        content: "i";
        font: bold 16px/24px serif;
        height: 24px;
        left: 0px;
        text-align: center;
        top: 4px;
        width: 24px;
      }      @media (min-width: 768px) {
        body {
          margin: 6em;
        }
      }
    </style>
  </head>
  <body>
    <div>
      <h1>Application is not available</h1>
      <p>The application is currently not serving requests at this endpoint. It may not have been started or is still starting.</p>      <div class="alert alert-info">
        <p class="info">
          Possible reasons you are seeing this page:
        </p>
        <ul>
          <li>
            <strong>The host doesn't exist.</strong>
            Make sure the hostname was typed correctly and that a route matching this hostname exists.
          </li>
          <li>
            <strong>The host exists, but doesn't have a matching path.</strong>
            Check if the URL path was typed correctly and that the route was created using the desired path.
          </li>
          <li>
            <strong>Route and path matches, but all pods are down.</strong>
            Make sure that the resources exposed by this route (pods, services, deployment configs, etc) have at least one pod running.
          </li>
        </ul>
      </div>
    </div>
  </body>
</html>

Note:

The above scenario works fine if we create triggers using the yaml files instead of using devconsole UI

https://github.com/openshift/console/pull/12148

Bug OCPBUGS-19061: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28260

Bug OCPBUGS-20881: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-3176: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task HOSTEDCP-978: Bump openshift/api version and fix KCM flags

View the Description View the linked PRs

follow-up fixes after the bump of k8 to 1.27 openshift/api#1424

https://github.com/openshift/hypershift/pull/2519

Bug OCPBUGS-11832: SSHkeys fails to write on upgrade to 4.13.rc3

View the Description View the linked PRs

Description of problem:

While trying to update build01 from 4.13.rc2->4.13.rc3, the MCO degraded upon trying to upgrade the first master node. The error being:

E0414 15:42:29.597388 2323546 writer.go:200] Marking Degraded due to: exit status 1

Which I mapped to this line:
https://github.com/openshift/machine-config-operator/blob/release-4.13/pkg/daemon/update.go#L1551

I think this error can be improved since it is a bit confusing, but that's not the main problem.

We noticed that the actual issue was that there is an existing "/home/core/.ssh" directory, that seemed to have been created by 4.13.rc2 (but could have been earlier), that belonged to the root user, as such when we attempted to create the folder via runuser core by hand, it failed with permission denied (and since we return the exec status, I think it just returned status 1 and not this error message).

I am currently not sure if we introduced something that caused this issue. There was an ssh (only on master pool) in that build01 cluster for 600 days already, so it must have worked in the past?

Workaround is to delete the .ssh folder and let the MCD recreate it

Version-Release number of selected component (if applicable):

4.13.rc3

How reproducible:

uncertain, but shouldn't be very high otherwise we would have ran into this in CI much more I think?

Steps to Reproduce:

1. create some 4.12 cluster with sshkey
2. upgrade to 4.13.rc2
3. upgrade to 4.13.rc3

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3810

Bug OCPBUGS-13762: Performance issues when using registries.conf

View the Description View the linked PRs

Description of problem:

When using the --oci-registries-config flag explicitly or getting registries.conf from the environment, execution time when processing related images via the addRelatedImageToMapping function serially can drastically impact performance depending on the number of images involved. In my testing of a large catalog, there were approximately 470 images and this took approximately 13 minutes. This processing occurs prior to letting the underlying oc mirror code plan out the images that should be mirrored. Actual planning time is consistent at around 1 min 30 seconds.

The cause of this is due to the need to determine mirrors for each one of the related images based on the configuration provided in registries.conf, and this action is done serially in a loop. If I introduce parallel execution, the processing time for addRelatedImageToMapping is reduced from ~13 min to ~14 seconds.

Version-Release number of selected component (if applicable): 4.13

How reproducible: always

Steps to Reproduce:

Note: the catalog used here is publicly available, but the related images are not so this may be difficult to reproduce.

Copy catalog image to disk in OCI layout

mkdir -p /tmp/oci/registriesconf/performance
skopeo --override-os linux copy docker://quay.io/jhunkins/ocp13762:v1 oci:///tmp/oci/registriesconf/performance --format v2s2

Create a ~/.config/containers/registries.conf file with this content

[[registry]]
location = "icr.io/cpopen"
insecure = false
blocked = false
mirror-by-digest-only = true
prefix = ""
[[registry.mirror]]
  location = "quay.io/jhunkins"
  insecure = false

Create a ISC [path to isc]/isc-registriesconf-performance.yaml

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror: 
  operators: 
  - catalog: oci:///tmp/oci/registriesconf/performance
    full: true
    targetTag: latest
    targetCatalog: ibm-catalog
storageConfig: 
  local: 
    path: /tmp/oc-mirror-temp

run oc mirror with OCI flags (running with dry run is sufficient to replicate this issue)

oc mirror --config [path to isc]/isc-registriesconf-performance.yaml --include-local-oci-catalogs --oci-insecure-signature-policy --dest-use-http docker://localhost:5000/oci --skip-cleanup --dry-run

Actual results:

roughly 13 minutes elapses before the planning phase begins

Expected results:

much faster execution before the planning phase begins

Additional info:

I intend to create a PR which adds parallel execution around the addRelatedImageToMapping function

https://github.com/openshift/oc-mirror/pull/638

Bug OCPBUGS-14936: baremetal-runtimecfg: remove duplicate word in log message

View the Description View the linked PRs

Description of problem:
Duplicate using in log message.

log.Infof("For node %s selected peer address %s using using OVN annotations.", node.Name, addr)

Version-Release number of selected component (if applicable):

4.14

How reproducible:
always

Steps to Reproduce:

1. code review
2.
3.

Actual results:

log.Infof("For node %s selected peer address %s using using OVN annotations.", node.Name, addr)

Expected results:

log.Infof("For node %s selected peer address %s using OVN annotations.", node.Name, addr)

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/260

Bug OCPBUGS-24047: Machine API Operator vSphere controller references retired KCS for HW Version Migrations

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18339~~. The following is the description of the original issue:
—
Description of problem:

The vSphere code references a Red Hat solution that has been retired in favour of the code being merged into the official documentation.

https://github.com/openshift/machine-api-operator/blob/master/pkg/controller/vsphere/reconciler.go#L827

Version-Release number of selected component (if applicable):

4.11-4.13 + main

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

UI presents a message with solution customers can not access.

Hardware lower than 15 is not supported, clone stopped. Detected machine template version is 13. Please update machine template: https://access.redhat.com/articles/6090681

Expected results:

Should referenced official documentation: https://docs.openshift.com/container-platform/4.12/updating/updating-hardware-on-nodes-running-on-vsphere.html

Additional info:

Bug OCPBUGS-11617: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/358

Bug OCPBUGS-12635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/248

Bug OCPBUGS-17280: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/229

Bug OCPBUGS-23315: Set automountServiceAccountToken to false for network-node-identity deployment in Hypershift

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23082~~. The following is the description of the original issue:
—
Description of problem:

From our initial investigation, it seems like the network-node-identity component does not need management cluster access in Hypershift

We were looking at:
https://github.com/openshift/cluster-network-operator/blob/release-4.14/bindata/network/node-identity/managed/node-identity.yaml

For the webhook and approver container: https://github.com/openshift/ovn-kubernetes/blob/release-4.14/go-controller/cmd/ovnkube-identity/ovnkubeidentity.go

For the token minter container: https://github.com/openshift/hypershift/blob/release-4.14/token-minter/tokenminter.go

We also tested by disabling the automountserviceaccounttoken and things still seemed to be functioning

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Deploy a 4.14 hosted cluster
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2107

Bug AGENT-678: Concurrency issue in agent integration tests

View the Description View the linked PRs

The agent integration tests is failing with different errors when run multiple times locally:

Local Run 1:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": lstat /home/rwsu/.cache/agent/files_cache/libnmstate.so.2: no such file or directory
[exit status 1]
FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

Local Run 2:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": file /usr/bin/agent-tui was not found
[exit status 1]
FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

In the [CI|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/7299/pull-ci-openshift-installer-master-agent-integration-tests/1677347591739674624,] it has failed in this PR multiple times with this error:

level=fatal msg=failed to fetch Agent Installer PXE Files: failed to fetch dependency of "Agent Installer PXE Files": failed to generate asset "Agent Installer Artifacts": lstat /.cache/agent/files_cache/agent-tui: no such file or directory   32  [exit status 1]   33  FAIL: testdata/agent/pxe/configurations/sno.txt:3: unexpected command failure

I believe the issue is the integration tests are running in parallel, and the extractFileFromImage function in pkg/asset/agent/image/oc.go problematic because the cache is being cleared and then files extracted to the same path. When the tests run in parallel, another test could clear the cached files and when the current test tries to read the file from the cached directory, it has disappeared.

Adding

-parallel 1

to ./hack/go-integration-test.sh eliminates the errors, so that why I think it is an concurrency issue.

https://github.com/openshift/installer/pull/7303

Bug OCPBUGS-10649: Hypershift replace upgrade: node in NotReady after upgrading from a 4.14 image to another 4.14 image

View the Description View the linked PRs

Description of problem:

After a replace upgrade from OCP 4.14 image to another 4.14 image first node is in NotReady.

jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
NAME                     STATUS   ROLES  AGE   VERSION
ip-10-0-128-175.us-east-2.compute.internal  Ready   worker  72m   v1.26.2+06e8c46
ip-10-0-134-164.us-east-2.compute.internal  Ready   worker  68m   v1.26.2+06e8c46
ip-10-0-137-194.us-east-2.compute.internal  Ready   worker  77m   v1.26.2+06e8c46
ip-10-0-141-231.us-east-2.compute.internal  NotReady  worker  9m54s  v1.26.2+06e8c46

- lastHeartbeatTime: "2023-03-21T19:48:46Z"
  lastTransitionTime: "2023-03-21T19:42:37Z"
  message: 'container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady
   message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/.
   Has your network provider started?'
  reason: KubeletNotReady
  status: "False"
  type: Ready

Events:
 Type   Reason          Age         From          Message
 ----   ------          ----        ----          -------
 Normal  Starting         11m         kubelet        Starting kubelet.
 Normal  NodeHasSufficientMemory 11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientMemory
 Normal  NodeHasNoDiskPressure  11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasNoDiskPressure
 Normal  NodeHasSufficientPID   11m (x2 over 11m)  kubelet        Node ip-10-0-141-231.us-east-2.compute.internal status is now: NodeHasSufficientPID
 Normal  NodeAllocatableEnforced 11m         kubelet        Updated Node Allocatable limit across pods
 Normal  Synced          11m         cloud-node-controller Node synced successfully
 Normal  RegisteredNode      11m         node-controller    Node ip-10-0-141-231.us-east-2.compute.internal event: Registered Node ip-10-0-141-231.us-east-2.compute.internal in Controller
 Warning ErrorReconcilingNode   17s (x30 over 11m) controlplane      nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation

ovnkube-master log:

I0321 20:55:16.270197       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:16.270209       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:16.270273       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:17.851497       1 master.go:719] Adding or Updating Node "ip-10-0-137-194.us-east-2.compute.internal"
I0321 20:55:25.965132       1 master.go:719] Adding or Updating Node "ip-10-0-128-175.us-east-2.compute.internal"
I0321 20:55:45.928694       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432145 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0321 20:55:46.270129       1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:55:46.270154       1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:55:46.270164       1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal"
I0321 20:55:46.270201       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:46.270209       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:46.270284       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:55:52.916512       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Namespace total 5 items received
I0321 20:56:06.910669       1 reflector.go:559] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Pod total 12 items received
I0321 20:56:15.928505       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:NB_Global Row:map[options:{GoMap:map[e2e_timestamp:1679432175 mac_prefix:2e:f9:d8 max_tunid:16711680 northd_internal_version:23.03.1-20.27.0-70.6 northd_probe_interval:5000 svc_monitor_mac:fe:cb:72:cf:f8:5f use_logical_dp_groups:true]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column _uuid == {c8b24290-296e-44a2-a4d0-02db7e312614}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
I0321 20:56:16.269611       1 obj_retry.go:265] Retry object setup: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:56:16.269637       1 obj_retry.go:319] Adding new object: *v1.Node ip-10-0-141-231.us-east-2.compute.internal
I0321 20:56:16.269646       1 master.go:719] Adding or Updating Node "ip-10-0-141-231.us-east-2.compute.internal"
I0321 20:56:16.269688       1 default_network_controller.go:667] Node add failed for ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:56:16.269697       1 obj_retry.go:326] Retry add failed for *v1.Node ip-10-0-141-231.us-east-2.compute.internal, will try again later: nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation
I0321 20:56:16.269724       1 event.go:285] Event(v1.ObjectReference{Kind:"Node", Namespace:"", Name:"ip-10-0-141-231.us-east-2.compute.internal", UID:"621e6289-ca5a-4e17-afff-5b49961cfb38", APIVersion:"v1", ResourceVersion:"52970", FieldPath:""}): type: 'Warning' reason: 'ErrorReconcilingNode' nodeAdd: error adding node "ip-10-0-141-231.us-east-2.compute.internal": could not find "k8s.ovn.org/node-subnets" annotation

cluster-network-operator log:

I0321 21:03:38.487602       1 log.go:198] Set operator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:38.488312       1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged
I0321 21:03:38.499825       1 log.go:198] Set ClusterOperator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:38.571013       1 log.go:198] Set HostedControlPlane conditions:
- lastTransitionTime: "2023-03-21T17:38:24Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidAWSIdentityProvider
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Configuration passes validation
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidHostedControlPlaneConfiguration
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: QuorumAvailable
  status: "True"
  type: EtcdAvailable
- lastTransitionTime: "2023-03-21T17:38:23Z"
  message: Kube APIServer deployment is available
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: KubeAPIServerAvailable
- lastTransitionTime: "2023-03-21T20:26:29Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "False"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:37:11Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: InfrastructureReady
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: External DNS is not configured
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ExternalDNSReachable
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: Available
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Reconciliation active on resource
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ReconciliationActive
- lastTransitionTime: "2023-03-21T17:38:25Z"
  message: All is well
  reason: AsExpected
  status: "True"
  type: AWSDefaultSecurityGroupCreated
- lastTransitionTime: "2023-03-21T19:30:54Z"
  message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster
    operator network is degraded'
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "False"
  type: ClusterVersionProgressing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Condition not found in the CVO.
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ClusterVersionUpgradeable
- lastTransitionTime: "2023-03-21T17:44:05Z"
  message: Done applying 4.14.0-0.nightly-2023-03-20-201450
  observedGeneration: 3
  reason: FromClusterVersion
  status: "True"
  type: ClusterVersionAvailable
- lastTransitionTime: "2023-03-21T19:55:15Z"
  message: Cluster operator network is degraded
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "True"
  type: ClusterVersionFailing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450"
    architecture="amd64"
  observedGeneration: 3
  reason: PayloadLoaded
  status: "True"
  type: ClusterVersionReleaseAccepted
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "False"
  type: network.operator.openshift.io/ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: network.operator.openshift.io/Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/multus" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: network.operator.openshift.io/Progressing
- lastTransitionTime: "2023-03-21T17:39:27Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Available
I0321 21:03:39.450912       1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status
I0321 21:03:39.450953       1 pod_watcher.go:125] Operand /, Kind= openshift-multus/multus updated, re-generating status
I0321 21:03:39.493206       1 log.go:198] Set operator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:39.494050       1 log.go:198] Skipping reconcile of Network.operator.openshift.io: spec unchanged
I0321 21:03:39.508538       1 log.go:198] Set ClusterOperator conditions:
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "False"
  type: ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  status: "True"
  type: Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2023-03-21T17:39:26Z"
  status: "True"
  type: Available
I0321 21:03:39.684429       1 log.go:198] Set HostedControlPlane conditions:
- lastTransitionTime: "2023-03-21T17:38:24Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidAWSIdentityProvider
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Configuration passes validation
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ValidHostedControlPlaneConfiguration
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: QuorumAvailable
  status: "True"
  type: EtcdAvailable
- lastTransitionTime: "2023-03-21T17:38:23Z"
  message: Kube APIServer deployment is available
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: KubeAPIServerAvailable
- lastTransitionTime: "2023-03-21T20:26:29Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "False"
  type: Degraded
- lastTransitionTime: "2023-03-21T17:37:11Z"
  message: All is well
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: InfrastructureReady
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: External DNS is not configured
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ExternalDNSReachable
- lastTransitionTime: "2023-03-21T19:24:24Z"
  message: ""
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: Available
- lastTransitionTime: "2023-03-21T17:37:06Z"
  message: Reconciliation active on resource
  observedGeneration: 3
  reason: AsExpected
  status: "True"
  type: ReconciliationActive
- lastTransitionTime: "2023-03-21T17:38:25Z"
  message: All is well
  reason: AsExpected
  status: "True"
  type: AWSDefaultSecurityGroupCreated
- lastTransitionTime: "2023-03-21T19:30:54Z"
  message: 'Error while reconciling 4.14.0-0.nightly-2023-03-20-201450: the cluster
    operator network is degraded'
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "False"
  type: ClusterVersionProgressing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Condition not found in the CVO.
  observedGeneration: 3
  reason: StatusUnknown
  status: Unknown
  type: ClusterVersionUpgradeable
- lastTransitionTime: "2023-03-21T17:44:05Z"
  message: Done applying 4.14.0-0.nightly-2023-03-20-201450
  observedGeneration: 3
  reason: FromClusterVersion
  status: "True"
  type: ClusterVersionAvailable
- lastTransitionTime: "2023-03-21T19:55:15Z"
  message: Cluster operator network is degraded
  observedGeneration: 3
  reason: ClusterOperatorDegraded
  status: "True"
  type: ClusterVersionFailing
- lastTransitionTime: "2023-03-21T17:39:11Z"
  message: Payload loaded version="4.14.0-0.nightly-2023-03-20-201450" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-03-20-201450"
    architecture="amd64"
  observedGeneration: 3
  reason: PayloadLoaded
  status: "True"
  type: ClusterVersionReleaseAccepted
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "False"
  type: network.operator.openshift.io/ManagementStateDegraded
- lastTransitionTime: "2023-03-21T19:53:10Z"
  message: DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" rollout is not making
    progress - last change 2023-03-21T19:42:39Z
  reason: RolloutHung
  status: "True"
  type: network.operator.openshift.io/Degraded
- lastTransitionTime: "2023-03-21T17:39:21Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Upgradeable
- lastTransitionTime: "2023-03-21T19:42:39Z"
  message: |-
    DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)
    DaemonSet "/openshift-ovn-kubernetes/ovnkube-node" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: network.operator.openshift.io/Progressing
- lastTransitionTime: "2023-03-21T17:39:27Z"
  message: ""
  reason: AsExpected
  status: "True"
  type: network.operator.openshift.io/Available

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. management cluster 4.13
2. bring up the hostedcluster and nodepool in 4.14.0-0.nightly-2023-03-19-234132
3. upgrade the hostedcluster to 4.14.0-0.nightly-2023-03-20-201450 
4. replace upgrade the nodepool to 4.14.0-0.nightly-2023-03-20-201450

Actual results

First node is in NotReady

Expected results:

All nodes should be Ready

Additional info:

No issue with replace upgrade from 4.13 to 4.14

https://github.com/openshift/cluster-network-operator/pull/1748

Bug OCPBUGS-12435: EgressNetworkPolicy DNS resolution does not fall back to TCP for truncated responses

View the Description View the linked PRs

Description of problem:

If the user specifies a DNS name in an egressnetworkpolicy for which the upstream server returns a truncated DNS response, openshift-sdn does not fall back to TCP as expected but just take this as a failure.

Version-Release number of selected component (if applicable):

4.11 (originally reproduced on 4.9)

How reproducible:

Always

Steps to Reproduce:

1. Setup an EgressNetworkPolicy that points to a domain where a truncated response is returned while querying via UDP.
2.
3.

Actual results:

Error, DNS resolution not completed.

Expected results:

Request retried via TCP and succeeded.

Additional info:

In comments.

https://github.com/openshift/sdn/pull/532

Bug OCPBUGS-22314: Techpreview CAPI: out of date manifests for Azure

View the Description View the linked PRs

Description of problem:

In the release-4.14 branch of the cluster-capi-operator the manifests for Azure are out of date.

Azure is actually not deployed so the bug is not running.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/136

Bug OCPBUGS-7416: Load Kamelets as event sources/sinks from custom Camel K operator namespace

View the Description View the linked PRs

Description of problem:

ODC automatically loads all Camel K Kamelets from openshift-operators namespace in order to display those resources in the event sources/sinks catalog. This is not working when the Camel K operator is installed in another namespace (e.g. in Developer Sandbox the Camel K operator had to be installed in camel-k-operator namespace)

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Display event sources/sinks catalog in ODC on a cluster where Camel K is installed in a namespace other than openshift-operators (e.g. Developer Sandbox)

Steps to Reproduce:

1. Make sure to have a cluster where Knative eventing is available
2. Install Camel K operator in camel-k-operator namespace (e.g. via OLM)
3. Display the event source/sink catalog in ODC

Actual results:

No Kamelets are visible in the catalog

Expected results:

All Kamelets (automatically installed with the operator) should be visible as potential event sources/sinks in the catalog

Additional info:

The Kamelet resources are being watched in two namespaces (current user namespace and global operator namespace. https://github.com/openshift/console/blob/master/frontend/packages/knative-plugin/src/hooks/useKameletsData.ts#L12-L28

We should allow configuration of the global namespace or also add camel-k-operator namespace as 3rd place to look for installed Kamelets.

https://github.com/openshift/console/pull/12710

Story API-1537: rebase openshift/apiserver

View the linked PRs

https://github.com/openshift/openshift-apiserver/pull/360

Bug MGMT-14338: Missing notifications on infraenv registration

View the Description View the linked PRs

Description of the problem:

Infraenv creation data missing

How reproducible:

data is propagated only on infraenv update

Steps to reproduce:

1. create new cluster

2. check elastic data: some special feature is missing

https://github.com/openshift/assisted-service/pull/5132

Bug OCPBUGS-14491: Update Jenkins to use 4.13 images

View the Description View the linked PRs

Description of problem:

Update to use Jenkins 4.13 images to address CVEs

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/502

Bug OCPBUGS-10003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-15232: Installation failed - 0 hosts available while choosing host for machine

View the Description View the linked PRs

Description of problem:

Cluster deployment of 4.14.0-0.nightly-2023-06-20-065807 fails as worker nodes are stuck in INSPECTING state despite being reported as MANAGEABLE

From the logs of machine-controller container in machine-api-controllers pod:

I0621 06:12:02.779472       1 request.go:682] Waited for 2.095824347s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/performance.openshift.io/v2?timeout=32s
E0621 06:12:02.781540       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
I0621 06:12:02.783418       1 controller.go:179] kni-qe-4-tj65t-worker-0-h6s8g: reconciling Machine
2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-h6s8g exists.
2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-h6s8g does not exist.
I0621 06:12:02.783439       1 controller.go:372] kni-qe-4-tj65t-worker-0-h6s8g: reconciling machine triggers idempotent create
2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-h6s8g
2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-h6s8g'
2023/06/21 06:12:02 No available BareMetalHost found
W0621 06:12:02.783735       1 controller.go:374] kni-qe-4-tj65t-worker-0-h6s8g: failed to create machine: requeue in: 30s
I0621 06:12:02.783748       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s
I0621 06:12:02.783780       1 controller.go:179] kni-qe-4-tj65t-worker-0-j259x: reconciling Machine
2023/06/21 06:12:02 Checking if machine kni-qe-4-tj65t-worker-0-j259x exists.
2023/06/21 06:12:02 Machine kni-qe-4-tj65t-worker-0-j259x does not exist.
I0621 06:12:02.783792       1 controller.go:372] kni-qe-4-tj65t-worker-0-j259x: reconciling machine triggers idempotent create
2023/06/21 06:12:02 Creating machine kni-qe-4-tj65t-worker-0-j259x
2023/06/21 06:12:02 0 hosts available while choosing host for machine 'kni-qe-4-tj65t-worker-0-j259x'
2023/06/21 06:12:02 No available BareMetalHost found
W0621 06:12:02.783971       1 controller.go:374] kni-qe-4-tj65t-worker-0-j259x: failed to create machine: requeue in: 30s
I0621 06:12:02.783976       1 controller.go:404] Actuator returned requeue-after error: requeue in: 30s

BMH Resources:

oc get bmh -A
NAMESPACE               NAME                 STATE                    CONSUMER                  ONLINE   ERROR   AGE
openshift-machine-api   openshift-master-0   externally provisioned   kni-qe-4-tj65t-master-0   true             175m
openshift-machine-api   openshift-master-1   externally provisioned   kni-qe-4-tj65t-master-1   true             175m
openshift-machine-api   openshift-master-2   externally provisioned   kni-qe-4-tj65t-master-2   true             175m
openshift-machine-api   openshift-worker-0   inspecting                                         true             175m
openshift-machine-api   openshift-worker-1   inspecting                                         true             175m

From Ironic:

baremetal node list
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
| UUID                                 | Name                                     | Instance UUID                        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+
| 86f146e3-3e48-4a7a-b0ef-57c42083fc92 | openshift-machine-api~openshift-master-0 | 7eeb9e57-2df2-4710-82d9-d3f99a20348e | power on    | active             | False       |
| 2380f211-934f-4193-8cb1-d09e7008410c | openshift-machine-api~openshift-master-2 | fd856ced-2912-4800-848c-256c00a1fdb7 | power on    | active             | False       |
| 9ad70c58-de44-4d56-9304-4bf7c95de6fb | openshift-machine-api~openshift-master-1 | aa1a4c89-4215-44ec-90c7-9c5f3de95ab8 | power on    | active             | False       |
| bb5ea5f4-016c-4bdd-834d-61d575284bf3 | openshift-machine-api~openshift-worker-0 | None                                 | power off   | manageable         | False       |
| 3045a07a-09d6-43a0-ab9c-d856b54bad6c | openshift-machine-api~openshift-worker-1 | None                                 | power off   | manageable         | False       |
+--------------------------------------+------------------------------------------+--------------------------------------+-------------+--------------------+-------------+

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-20-065807

How reproducible:

so far once

Steps to Reproduce:

1. Deploy baremetal dualstack cluster with day1 networking

Actual results:

Deployment fails as worker nodes are not provisioned

Expected results:

Deployment succeeds

https://github.com/openshift/cluster-baremetal-operator/pull/348

Bug OCPBUGS-19319: agent-tui failure blocks ssh + console login

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19037~~. The following is the description of the original issue:
—
The agent-interactive-console service is required by both sshd and systemd-logind, so if it exits with an error code there is no way to connect or log in to the box to debug.

https://github.com/openshift/installer/pull/7497

Bug OCPBUGS-5461: [IPI on BareMetal]: Workers failing inspection when installing with proxy

View the Description View the linked PRs

Description of problem:

When installing a 3 master + 2 worker BM IPv6 cluster with proxy, worker BMHs are failing inspection with the message: "Could not contact ironic-inspector for version discovery: Unable to find a version discovery document". This causes the installation to fail due to nodes with worker role never joining the cluster. However, when installing with no workers, the issue does not reproduce and the cluster installs successfully.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-04-203333

How reproducible:

100%

Steps to Reproduce:

1. Attempt to install an IPv6 cluster with 3 masters + 2 workers and proxy with baremetal installer

Actual results:

Installation never completes because a number of pods are in Pending status

Expected results:

Workers join the cluster and installation succeeds

Additional info:

$ oc get events
LAST SEEN   TYPE     REASON              OBJECT                               MESSAGE
174m        Normal   InspectionError     baremetalhost/openshift-worker-0-1   Failed to inspect hardware. Reason: unable to start inspection: Could not contact ironic-inspector for version discovery: Unable to find a version discovery document at https://[fd2e:6f44:5dd8::37]:5050, the service is unavailable or misconfigured. Required version range (any - any), version hack disabled.
174m        Normal   InspectionError     baremetalhost/openshift-worker-0-0   Failed to inspect hardware. Reason: unable to start inspection: Could not contact ironic-inspector for version discovery: Unable to find a version discovery document at https://[fd2e:6f44:5dd8::37]:5050, the service is unavailable or misconfigured. Required version range (any - any), version hack disabled.
174m        Normal   InspectionStarted   baremetalhost/openshift-worker-0-0   Hardware inspection started
174m        Normal   InspectionStarted   baremetalhost/openshift-worker-0-1   Hardware inspection started

https://github.com/openshift/cluster-baremetal-operator/pull/322

Bug OCPBUGS-12913: CI fails on TestRouterCompressionOperation

View the Description View the linked PRs

Description of problem

CI is flaky because the TestRouterCompressionOperation test fails.

Version-Release number of selected component (if applicable)

I have seen these failures on 4.14 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 7.71% of runs (16.58% of failures) across 402 total runs and 24 jobs (46.52% failed)

GCP is most impacted:

pull-ci-openshift-cluster-ingress-operator-master-e2e-gcp-operator (all) - 44 runs, 86% failed, 37% of failures match = 32% impact

Azure and AWS are also impacted:

pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator (all) - 36 runs, 64% failed, 43% of failures match = 28% impact

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 38 runs, 79% failed, 23% of failures match = 18% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=compression+error%3A+expected&maxAge=336h&context=1&type=build-log&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails:

TestAll/serial/TestRouterCompressionOperation 
=== RUN   TestAll/serial/TestRouterCompressionOperation
    router_compression_test.go:209: compression error: expected "gzip", got "" for canary route

Expected results

CI passes, or it fails on a different test.

https://github.com/openshift/cluster-ingress-operator/pull/920

Bug OCPBUGS-10071: Update 4.14 ose-vsphere-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-vsphere/pull/12

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-vsphere/pull/12

Bug OCPBUGS-21868: Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20350~~. The following is the description of the original issue:
—
Description of problem:

Vsphere IPI installation is getting failed with panic: runtime error: invalid memory address or nil pointer dereference

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Download 4.13 installation binary
2. Run openshift-install create cluster command.

Actual results:

Error:

DEBUG   Generating Platform Provisioning Check...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x3401c4e]goroutine 1 [running]:
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateESXiVersion(0xc001524060?, {0xc00018aff0, 0x43}, 0x1?, 0x1?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:279 +0xb6e
github.com/openshift/installer/pkg/asset/installconfig/vsphere.validateFailureDomain(0xc001524060, 0xc00022c840, 0x0)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:167 +0x6b6
github.com/openshift/installer/pkg/asset/installconfig/vsphere.ValidateForProvisioning(0xc0003d4780)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/vsphere/validation.go:132 +0x675
github.com/openshift/installer/pkg/asset/installconfig.(*PlatformProvisionCheck).Generate(0xc0000f2000?, 0x5?)
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/platformprovisioncheck.go:112 +0x45f
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc012d0, 0x2279afa8}, {0x7c34091, 0x2})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:226 +0x5fa
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc000925e90, {0x1dc01090, 0x22749ce0}, {0x0, 0x0})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:220 +0x75b
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffe670305f1?, {0x1dc01090, 0x22749ce0}, {0x227267a0, 0x8, 0x8})
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffe670305f1, 0x6})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:260 +0x125
main.runTargetCmd.func2(0x2272da00?, {0xc000925410?, 0x3?, 0x3?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:290 +0xe7
github.com/spf13/cobra.(*Command).execute(0x2272da00, {0xc000925380, 0x3, 0x3})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000210900)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Expected results:

Installation to be completed successfully.

Additional info:

https://github.com/openshift/installer/pull/7605

Bug OCPBUGS-9964: egressip cannot be assigned on hypershift hosted cluster node

View the Description View the linked PRs

Description of problem:

egressip cannot be assigned on hypershift hosted cluster node

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-09-162945

How reproducible:

100%

Steps to Reproduce:

1. setup hypershift env


2. lable egress ip node on hosted cluster
% oc get node
NAME                                         STATUS   ROLES    AGE     VERSION
ip-10-0-129-175.us-east-2.compute.internal   Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-129-244.us-east-2.compute.internal   Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-141-41.us-east-2.compute.internal    Ready    worker   3h20m   v1.26.2+bc894ae
ip-10-0-142-54.us-east-2.compute.internal    Ready    worker   3h20m   v1.26.2+bc894ae

% oc label node/ip-10-0-129-175.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-129-175.us-east-2.compute.internal labeled
% oc label node/ip-10-0-129-244.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-129-244.us-east-2.compute.internal labeled
% oc label node/ip-10-0-141-41.us-east-2.compute.internal k8s.ovn.org/egress-assignable=""
node/ip-10-0-141-41.us-east-2.compute.internal labeled
% oc label node/ip-10-0-142-54.us-east-2.compute.internal  k8s.ovn.org/egress-assignable=""
node/ip-10-0-142-54.us-east-2.compute.internal labeled


3. create egressip
% cat egressip.yaml 
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip-1
spec:
  egressIPs: [ "10.0.129.180" ]
  namespaceSelector:
    matchLabels:
      env: ovn-tests
% oc apply -f egressip.yaml 
egressip.k8s.ovn.org/egressip-1 created


4. check egressip assignment

Actual results:

egressip cannot assigned to node
% oc get egressip NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS egressip-1   10.0.129.180

Expected results:

egressip can be assigned to one of the hosted cluster node

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1734

Bug MGMT-14396: Installer binary cache fails with mirrored release image

View the Description View the linked PRs

Description of the problem:

Since MGMT-13083 merged, disconnected jobs are failing in the ephemeral installer (specifically e2e-agent-sno-ipv6 and e2e-agent-ha-dualstack). Preparing for installation fails because we can't get the installer binary:

Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=info msg="Successfully extracted openshift-baremetal-install binary from the release to: /data/install-config-generate/installercache/virthost.ostest.test.metalkube.org:5000/localimages/local-release-image@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install" func="github.com/openshift/assisted-service/internal/oc.(*release).extractFromRelease" file="/src/internal/oc/release.go:376" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 go-id=18956 request_id=
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=error msg="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).generateClusterInstallConfig" file="/src/internal/bminventory/inventory.go:1738" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18956 pkg=Inventory request_id=
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=warning msg="Cluster installation initialization failed" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).InstallClusterInternal.func3.1" file="/src/internal/bminventory/inventory.go:1339" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26: failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18932 pkg=Inventory request_id=ca799c5a-c798-4a93-9bf8-7f27ed93ca20
Apr 21 10:00:43 master-0 service[2298]: time="2023-04-20T22:00:43Z" level=warning msg="Failed to prepare installation of cluster a3945e90-44a8-436c-89ad-12d3a5820a26" func="github.com/openshift/assisted-service/internal/cluster.(*Manager).HandlePreInstallError" file="/src/internal/cluster/cluster.go:985" cluster_id=a3945e90-44a8-436c-89ad-12d3a5820a26 error="failed generating install config for cluster a3945e90-44a8-436c-89ad-12d3a5820a26: failed to get installer path: Failed to create hard link to binary /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install: link /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/openshift-baremetal-install /data/install-config-generate/installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release@sha256:63357ac661a312dde07b60350ea72428463853ea9a09cdf9487d853496a97d58/ln_1682028043_openshift-baremetal-install: no such file or directory" go-id=18956 pkg=cluster-state request_id=

The issue appears to be that we extract the binary to a path including the mirror registry (installercache/virthost.ostest.test.metalkube.org:5000/localimages/local-release-image) but then look for it at a path representing the original pullspec (installercache/registry.build05.ci.openshift.org/ci-op-1w73h6fv/release)

How reproducible:

100%

Steps to reproduce:

1. Use the agent-based installer to install using a disconnected mirror registry in the ImageContentSources.

Actual results:

Installation never starts, we just see a loop of:

evel=debug msg=Host worker-0: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host worker-1: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-0: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-1: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host master-2: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation)
level=debug msg=Host worker-0: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host worker-1: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-0: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-1: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host master-2: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation)
level=debug msg=Host worker-0: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host worker-1: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-0: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-1: updated status from preparing-successful to known (Host is ready to be installed)
level=debug msg=Host master-2: updated status from preparing-successful to known (Host is ready to be installed)

Expected results:

Cluster is installed.

https://github.com/openshift/assisted-service/pull/5141

Bug OCPBUGS-13386: Enable UseCSINodeID feature for vSphere CSI driver

View the Description View the linked PRs

We need to enable vSphere CSI driver to use UseCSINodeID feature, so as it is on feature parity with upstream.

Bug OCPBUGS-22411: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/28356

Bug OCPBUGS-12980: Kubelet CA file not written by MCD firstboot

View the Description View the linked PRs

Description of problem:

In our IBM Cloud use-case of RHCOS, we are seeing 4.13 RHCOS nodes failing to properly bootstrap to a HyperShift 4.13 control plane. RHCOS worker node kubelet is failing with "failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/kubelet-ca.crt: open /etc/kubernetes/kubelet-ca.crt: no such file or directory".

Version-Release number of selected component (if applicable):

4.13.0-rc.6

How reproducible:

100%

Steps to Reproduce:

1. Create a HyperShift 4.13 control plane
2. Boot a RHCOS host outside of cluster
3. After initial RHCOS boot, fetch ignition from control plane
4. Attempt to bootstrap to cluster via `machine-config-daemon firstboot-complete-machineconfig`

Actual results:

Kubelet service fails with "failed to construct kubelet dependencies: unable to load client CA file /etc/kubernetes/kubelet-ca.crt: open /etc/kubernetes/kubelet-ca.crt: no such file or directory".

Expected results:

RHCOS worker node to properly bootstrap to HyperShift control plane. This has been the supported bootstrapping flow for releases <4.13.

Additional info:

References:
- https://redhat-internal.slack.com/archives/C01C8502FMM/p1682968210631419
- https://github.com/openshift/machine-config-operator/pull/3575
- https://github.com/openshift/machine-config-operator/pull/3654

https://github.com/openshift/machine-config-operator/pull/3694

Bug OCPBUGS-16036: CredentialsRequest with secret generated by CCO on STS Manual Mode cluster does not have status

View the Description View the linked PRs

Description of problem:

Credentials secret generated by CCO on STS Manual Mode cluster does not have status

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

4.14.0

Steps to Reproduce:

1. Create a Manual mode, STS cluster in AWS.
2. Create a CredentialsRequest which provides .spec.cloudTokenPath and .spec.providerSpec.stsIAMRoleARN.
3. Observe that secret is created by CCO in the target namespace specified by the CredentialsRequest.
4. Observe that the CredentialsRequest does not set status once the secret is generated. Specifically, the CredentialsRequest does not set .status.provisioned == true.

Actual results:

Status is not set on CredentialsRequest with provisioned secret.

Expected results:

Status is set on CredentialsRequest with provisioned secret.

Additional info:

Reported by Jan Safranek when testing integration with the aws-efs-csi-driver-operator.

https://github.com/openshift/cloud-credential-operator/pull/562

Task OU-231: Add Gabriel and Jenny to OWNERS for console's components/monitoring/ dir

View the Description View the linked PRs

So that they can review and approve most observability UI changes that require console code changes.

https://github.com/openshift/console/pull/13069

Bug OCPBUGS-9329: Plugin count numbers in the Cluster Dashboard Dynamic Plugins popover can be incorrect when the console is running in development mode

View the Description View the linked PRs

Description of problem: When running in development mode [1], the Loaded enabled plugin count numbers in the Cluster Dashboard Dynamic Plugins popover may be incorrect. In order to make the experience less confusing for users working with the console in development mode, we need to:

add a switch (SERVER_FLAG?) to identify the console is running dev mode
update the Cluster Dashboard Dynamic Plugins popover to only show plugins running in dev mode
update the Console plugins list (e.g., /k8s/cluster/operator.openshift.io~v1~Console/cluster/console-plugins) to only show plugins running in dev mode
update https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/README.md#plugin-development

Note there is additional work planned in https://issues.redhat.com/browse/CONSOLE-3185. This bug is intended to only capture improving the experience for development mode.

[1] https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/README.md#plugin-development

https://github.com/openshift/console/pull/12666

Story CFE-859: Improve user experience of oc-mirror by clarifying the flags used for oci local catalogs

View the Description View the linked PRs

As IBM, I would like to replace flag --use-oci-feature flag with --include-oci-local-catalogs

--use-oci-feature is implying to users that this might be about using oci format for images rather than docker-v2, and this might be hard to understand and generate questions, bugs, and new misunderstood requests.
For clarity, and before this feature goes GA, this flag will be replaced by --include-local-oci-catalog in 4.14. The --use-oci-feature will be marked deprecated in 4.13, and completely removed in 4.14

As an oc-mirror user I want a well documented and intuitive process
so that I can effectively and efficiently deliver image artifacts in both connected and disconnected installs with no impact on my current workflow

Glossary:

OCI-FBC operator catalog: catalog image in oci format saved to disk, referenced with oci://path-to-image
registry based operator catalog: catalog image hosted on a container registry.

References:

https://docs.google.com/document/d/10hLyV0DZP-3uxJGOYzvlE-ZyJXcwD95yxxX4eKN0uOA/edit

Acceptance criteria:

No regression on oc-mirror use cases that are not using OCI-FBC feature
mirrorToMirror use case with oci feature flag should be successful when all operator catalogs in ImageSetConfig are OCI-FBC:
- oc-mirror -c config.yaml docker://remote-registry --use-oci-feature succeeds
- All release images, helm charts, additional images are mirrored to the remote-registry in an incremental manner (only new images are mirrored based on contents of the storageConfig)
- All catalogs OCI-FBC, selected bundles and their related images are mirrored to the remote-registry and corresponding catalogSource and ImageSourceContentPolicy generated
- All registry based catalogs, selected bundles and their related images are mirrored to the remote-registry and corresponding catalogSource and ImageSourceContentPolicy generated
mirrorToDisk use case with the oci feature flag is forbidden. The following command should fail:
- oc-mirror --from=seq_xx_tar docker://remote-registry --use-oci-feature
diskToMirror use case with oci feature flag is forbidden. The following command should fail:
- oc-mirror --config=isc.yaml file://file-dir --use-oci-feature

https://github.com/openshift/oc-mirror/pull/622

Bug OCPBUGS-10116: Update 4.14 ose-ibm-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-ibm/pull/48

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-ibm/pull/48

Bug OCPBUGS-12157: ProvisioningFailed: error generating accessibility requirements: no topology key found on CSINode

View the Description View the linked PRs

Description of problem:

After adding FailureDomain topology as day-2 operation, I get ProvisioningFailed due to error generating accessibility requirements: no topology key found on CSINode ocp-storage-fxsc6-worker-0-fb977

Version-Release number of selected component (if applicable):

pre-merge payload with opt-in CSIMigration PRs

How reproducible:

2/2

Steps to Reproduce:

1. I installed the cluster without specifying the failureDomains (so I got one which generated by installer)
2. Added new failureDomain to test topology, and make sure all related resources(datacenterand ClusterComputeResource) are tagged in vsphere 
3. create pvc but failed with provisioning:
Warning ProvisioningFailed 80m (x14 over 103m) csi.vsphere.vmware.com_ocp-storage-fxsc6-master-0_a18e2651-6455-42b2-abc2-b3b3d197da56 failed to provision volume with StorageClass "thin-csi": error generating accessibility requirements: no topology key found on CSINode ocp-storage-fxsc6-worker-0-fb977

4. Here is the node label and csinode info 
$ oc get node ocp-storage-fxsc6-worker-0-b246w --show-labels 
NAME STATUS ROLES AGE VERSION LABELS 
ocp-storage-fxsc6-worker-0-b246w Ready worker 8h v1.26.3+2727aff beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=ocp-storage-fxsc6-worker-0-b246w,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos 
$ oc get csinode ocp-storage-fxsc6-worker-0-b246w -ojson | jq .spec.drivers[].topologyKeys 
null 

5. other logs:
I only find something in csi-driver-controller-8597f567f8-4f8z6 {"level":"info","time":"2023-04-17T10:30:13.352999527Z","caller":"k8sorchestrator/topology.go:326","msg":"failed to retrieve tags for category \"cns.vmware.topology-preferred-datastores\". Reason: GET https://ocp-storage.vmc.qe.devcluster.openshift.com:443/rest/com/vmware/cis/tagging/category/id:cns.vmware.topology-preferred-datastores: 404 Not Found","TraceId":"573c3fc8-e6cf-4594-8154-07bd514fcb46"}

In the vpd pod, the tag check passed: I0417 11:05:02.711093 1 util.go:110] Looking for CC: workloads-02 I0417 11:05:02.766516 1 zones.go:168] ClusterComputeResource: ClusterComputeResource:domain-c5265 @ /OCP-DC/host/workloads-02 I0417 11:05:02.766622 1 zones.go:64] Validating tags for ClusterComputeResource:domain-c5265. I0417 11:05:02.813568 1 zones.go:81] Processing attached tags I0417 11:05:02.813678 1 zones.go:90] Found Region: region-A I0417 11:05:02.813721 1 zones.go:96] Found Zone: zone-B I0417 11:05:02.834718 1 util.go:110] Looking for CC: qe-cluster/workloads-03 I0417 11:05:02.844475 1 reflector.go:559] k8s.io/client-go@v0.26.1/tools/cache/reflector.go:169: Watch close - *v1.ConfigMap total 7 items received I0417 11:05:02.890279 1 zones.go:168] ClusterComputeResource: ClusterComputeResource:domain-c9002 @ /OCP-DC/host/qe-cluster/workloads-03 I0417 11:05:02.890406 1 zones.go:64] Validating tags for ClusterComputeResource:domain-c9002. I0417 11:05:02.946720 1 zones.go:81] Processing attached tags I0417 11:05:02.946871 1 zones.go:96] Found Zone: zone-C I0417 11:05:02.946917 1 zones.go:90] Found Region: region-A I0417 11:05:02.946965 1 vsphere_check.go:242] CheckZoneTags passed

Actual results:

Provisioning failed.

Expected results:

Provisioning should be succeed.

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver/pull/82

Bug OCPBUGS-13359: Typing in Quick Starts filter input field will crash the Console

View the Description View the linked PRs

Description of problem:

When typing into the filter input field at the Quick Starts page, console will crash

Version-Release number of selected component (if applicable):

4.13.0-rc.7

How reproducible:

Always

Steps to Reproduce:

1. Go to the Quick Starts page 
2. Type something into the filter input field
3.

Actual results:

Console will crash:


TypeError
Description:
t.toLowerCase is not a functionComponent trace:
at Sn (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:168364)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:874032)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/quick-start-chunk-274c58e3845ea0aa718b.min.js:1:202)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:67583)
    at T
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:69628)
    at Suspense
    at i (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:450974)
    at section
    at m (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:720272)
    at div
    at div
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1528877)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:545409)
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:774923)
    at div
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:458124)
    at l (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1170951)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:457833
    at S (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:98:86864)
    at main
    at div
    at v (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:264066)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:62024)
    at div
    at div
    at c (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:545409)
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:774923)
    at div
    at d (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendor-patternfly-core-chunk-67ceb971158ed93c9c79.min.js:1:458124)
    at Un (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:183620)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:874032)
    at t.default (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/quick-start-chunk-274c58e3845ea0aa718b.min.js:1:1261)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:241397)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1605535)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at _t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:142374)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at ee (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623254)
    at i (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:829516)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1599727)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1599916)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1597332)
    at te (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1623385)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1626517
    at r (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:121910)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:67583)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:69628)
    at t (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:21:64188)
    at re (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1626828)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:803496)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:1074899)
    at s (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/main-chunk-4a1d080acbda22020fbd.min.js:1:652518)
    at t.a (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:150:190871)
    at Suspense
Stack trace:
TypeError: t.toLowerCase is not a function
    at pt (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:136019)
    at Sn (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:36:168723)
    at na (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:58879)
    at za (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:68397)
    at Hs (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:112289)
    at xl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98327)
    at Cl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98255)
    at _l (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:98118)
    at pl (https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:95105)
    at https://console-openshift-console.apps.viraj-10-05-2023-2.devcluster.openshift.com/static/vendors~main-chunk-141f889230d63da0ba53.min.js:263:44774

Expected results:

Console should work

Additional info:

https://github.com/openshift/console/pull/13126

Bug OCPBUGS-22647: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/494

Bug OCPBUGS-16093: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13021

Bug OCPBUGS-8224: Image Registry default to Removed on IBM cloud after 4.13.0-ec.3

View the Description View the linked PRs

Description of problem:

When install a cluster on IBM cloud, the image registry default to Removed, no storage configured after 4.13.0-ec.3
Image registry should use ibmcos object storage on IPI-IBM cluster 
https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/storage/storage.go#L182

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-27-101545

How reproducible:

always

Steps to Reproduce:

1.Install an IPI cluster on IBM cloud
2.Check image registry after install successfully
3.

Actual results:

oc get config.image/cluster -o yaml 
  spec:
    logLevel: Normal
    managementState: Removed
    observedConfig: null
    operatorLogLevel: Normal
    proxy: {}
    replicas: 1
    requests:
      read:
        maxWaitInQueue: 0s
      write:
        maxWaitInQueue: 0s
    rolloutStrategy: RollingUpdate
    storage: {}
    unsupportedConfigOverrides: null

oc get infrastructure cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-02T02:21:06Z"
  generation: 1
  name: cluster
  resourceVersion: "531"
  uid: 8d61a1e2-3852-40a2-bf5d-b7f9c92cda7b
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: IBMCloud
status:
  apiServerInternalURI: https://api-int.wxjibm32.ibmcloud.qe.devcluster.openshift.com:6443
  apiServerURL: https://api.wxjibm32.ibmcloud.qe.devcluster.openshift.com:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: wxjibm32-lmqh7
  infrastructureTopology: HighlyAvailable
  platform: IBMCloud
  platformStatus:
    ibmcloud:
      cisInstanceCRN: 'crn:v1:bluemix:public:internet-svcs:global:a/fdc2e14cf8bc4d53a67f972dc2e2c861:e8ee6ca1-4b31-4307-8190-e67f6925f83b::'
      location: eu-gb
      providerType: VPC
      resourceGroupName: wxjibm32-lmqh7
    type: IBMCloud

Expected results:

Image registry should use ibmcos object storage on IPI-IBM cluster

Additional info:

Must-gather log https://drive.google.com/file/d/1N-WUOZLRjlXcZI0t2O6MXsxwnsVPDCGQ/view?usp=share_link

https://github.com/openshift/cluster-image-registry-operator/pull/847

Bug OCPBUGS-10342: Installation fails if < 3 workers defined and number of compute replicas not set

View the Description View the linked PRs

Description of problem:

This may be something we want to either add a validation for or document. It was initially found at a customer site but I've also confirmed it happens with just a Compact config with no workers. 

They created an agent-config.yaml with 2 worker nodes but did not set the replicas in install-config.yaml, i.e. they did not set 
compute:
- hyperthreading: Enabled
  name: worker
  replicas: {{ num_workers }} 

This resulted in an install failure as by default 3 worker replicas are created if not defined
https://github.com/openshift/installer/blob/master/pkg/types/defaults/machinepools.go#L11

See the attached console screenshot showing that the expected number of hosts doesn't match the actual.

I've also duplicated this with a compact config. We can see that the install failed as start-cluster-installation.sh is looking for 6 hosts.

[core@master-0 ~]$ sudo systemctl status start-cluster-installation.service
● start-cluster-installation.service - Service that starts cluster installation
   Loaded: loaded (/etc/systemd/system/start-cluster-installation.service; enabled; vendor preset: enabled)
   Active: activating (start) since Wed 2023-03-15 14:40:04 UTC; 3min 41s ago
 Main PID: 3365 (start-cluster-i)
    Tasks: 5 (limit: 101736)
   Memory: 1.7M
   CGroup: /system.slice/start-cluster-installation.service
           ├─3365 /bin/bash /usr/local/bin/start-cluster-installation.sh
           ├─5124 /bin/bash /usr/local/bin/start-cluster-installation.sh
           ├─5132 /bin/bash /usr/local/bin/start-cluster-installation.sh
           └─5138 diff /tmp/tmp.vIq1jH9Vf2 /etc/issue.d/90_start-install.issueMar 15 14:42:54 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:04 master-0 start-cluster-installation.sh[4746]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:04 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:15 master-0 start-cluster-installation.sh[4980]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:15 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:25 master-0 start-cluster-installation.sh[5026]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:25 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:35 master-0 start-cluster-installation.sh[5079]: Hosts known and ready for cluster installation (3/6)
Mar 15 14:43:35 master-0 start-cluster-installation.sh[3365]: Waiting for hosts to become ready for cluster installation...
Mar 15 14:43:45 master-0 start-cluster-installation.sh[5124]: Hosts known and ready for cluster installation (3/6)

Since the compute section in install-config.yaml is optional we can't assume that it will be there 
https://github.com/openshift/installer/blob/master/pkg/types/installconfig.go#L126

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1. Remove the compute section from install-config.yaml
2. Do an install
3. See the failure

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7059

Bug OCPBUGS-16204: [CORS-2602]Masters are not attached with the provided custom security groups

View the Description View the linked PRs

Description of problem:

Set custom security group IDs in the following fields of install-config.yaml

installconfig.controlPlane.platform.aws.additionalSecurityGroupIDs installconfig.compute.platform.aws.additionalSecurityGroupIDs

such as: 

apiVersion: v1
 controlPlane:
   architecture: amd64
   hyperthreading: Enabled
   name: master
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-0d2f88b2980aa5547
       - sg-01f1d2f60a3b4cf6d
   replicas: 3
 compute:
 - architecture: amd64
   hyperthreading: Enabled
   name: worker
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-03418b6e2f68e1f63
       - sg-0376fc68fd4b834a4
   replicas: 3


After installation, check the Security Groups attached to master and worker, master doesn’t have the specified custom security groups attached while workers have. 

For one of the masters:
[root@preserve-gpei-worker ~]# aws ec2 describe-instances --instance-ids i-0cd007cca57c86ee9 --region us-west-2 --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "terraform-20230713031140984600000002",
                "GroupId": "sg-05495718555950f77"
            }
        ]
    ]
]

For one of the workers:
[root@preserve-gpei-worker ~]# aws ec2 describe-instances --instance-ids i-0572b7bde8ff07ac4 --region us-west-2 --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "gpei-0613a-worker-2",
                "GroupId": "sg-0376fc68fd4b834a4"
            },
            {
                "GroupName": "gpei-0613a-worker-1",
                "GroupId": "sg-03418b6e2f68e1f63"
            },
            {
                "GroupName": "terraform-20230713031140982700000001",
                "GroupId": "sg-0ce73044e426fe249"
            }
        ]
    ]
]

Also checked the master’s controlplanemachineset, it does have the custom security groups configured, but they’re not attached to the master instance in the end.

[root@preserve-gpei-worker k_files]# oc get controlplanemachineset -n openshift-machine-api cluster -o yaml |yq .spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.securityGroups
- filters:
    - name: tag:Name
      values:
        - gpei-0613a-pzjbk-master-sg
- id: sg-01f1d2f60a3b4cf6d
- id: sg-0d2f88b2980aa5547

Version-Release number of selected component (if applicable):

registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-07-11-092038

How reproducible:

 Always

Steps to Reproduce:

1. As mentioned above
2.
3.

Actual results:

masters doesn't have custom security groups added

Expected results:

masters should have custom security groups added like workers

Additional info:

https://github.com/openshift/installer/pull/7352

Bug OCPBUGS-21122: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-14909: Disable web-terminal tests in CI

View the Description View the linked PRs

Description of problem:

Web-terminal tests are constantly failing on CI. Disable them till they are fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-console-master-e2e-gcp-console

https://search.ci.openshift.org/?search=Web+Terminal+for+Admin+user&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Expected results:

Additional info:

https://github.com/openshift/console/pull/12892

Bug OCPBUGS-16313: CCO SA/cloud-credential-operator cannot list ConfigMaps at the cluster scope

View the Description View the linked PRs

Description of problem:

CCO's ServiceAccount cannot list ConfigMaps at the cluster scope.

Steps to Reproduce:

1. Install an OCP cluster (4.14.0-0.nightly-2023-07-17-215017, CCO commit id = 0c80cc35f6ee4b45016050b3e5a8710a8ed4dd81) with default configuration (CCO in default mode)

2. Create a dummy CredentialsRequest as follows:
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: test-cr
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
    stsIAMRoleARN: whatever
  secretRef:
    name: test-secret
    namespace: default
  serviceAccountNames:
  - default 

3. Check CCO Pod logs:
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="adding finalizer: cloudcredential.openshift.io/deprovision" controller=credreq cr=openshift-cloud-credential-operator/test-cr secret=default/test-secret
time="2023-07-18T10:02:45Z" level=info msg="syncing credentials request" controller=credreq cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="stsFeatureGateEnabled: false" actuator=aws cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="stsDetected: false" actuator=aws cr=openshift-cloud-credential-operator/test-cr
time="2023-07-18T10:02:45Z" level=info msg="clusteroperator status updated" controller=status
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:02:45Z" level=info msg="reconciling clusteroperator status"
W0718 10:02:45.352434       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:45.352460       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:46.512738       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:46.512763       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:48.859931       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:48.859957       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:02:53.514713       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:02:53.514798       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:03:03.042040       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:03:03.042068       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:03:25.023729       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:03:25.023758       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
time="2023-07-18T10:04:10Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-07-18T10:04:10Z" level=info msg="reconcile complete" controller=metrics elapsed=4.470475ms
W0718 10:04:11.033286       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:04:11.033311       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:04:42.316200       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:04:42.316223       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
W0718 10:05:40.852983       1 reflector.go:533] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
E0718 10:05:40.853008       1 reflector.go:148] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:233: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-credential-operator:cloud-credential-operator" cannot list resource "configmaps" in API group "" at the cluster scope
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="calculating metrics for all CredentialsRequests" controller=metrics
time="2023-07-18T10:06:10Z" level=info msg="reconcile complete" controller=metrics elapsed=3.531182ms
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
time="2023-07-18T10:06:10Z" level=info msg="reconciling clusteroperator status"
...

Bug OCPBUGS-17299: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/80

Story HOSTEDCP-809: Fix the Root CA in HCP namespace to have the same naming convention as Certmanager

View the Description View the linked PRs

When we create an HCP, the Root CA in the HCP namespaces has the certificate and key named as

ca.key
ca.crt
But to cert manager expects them to be named as
tls.key
tls.cert

Done criteria: The Root CA should have the certificate and key named as the cert manager expects.

https://github.com/openshift/hypershift/pull/2246

Bug OCPBUGS-14137: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/653

Bug OCPBUGS-16307: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1155

Bug OCPBUGS-17864: Web console slowness on Project>Project access page

View the Description View the linked PRs

Description of problem:

Cluster recently upgraded to OCP 4.12.19 experiencing serious slowness issues with Project>Project access page.
The loading time of that page grows significantly faster than the number of entries, and is very noticeable even at a relatively low number of entries.

Version-Release number of selected component (if applicable):

4.12.19

How reproducible:

Easily

Steps to Reproduce:

1. Create a namespace, and add RoleBindings for multiple users, for instance with :
$ oc -n test-namespace create rolebinding test-load --clusterrole=view --user=user01 --user=user02 --user=...
2. In Developer view of that namespace, navigate to "Project"->"Project access". The page will take a long time to load compared to the time an "oc get rolebinding" would take.

Actual results:

0 RB => instantaneous loading
40 RB => about 10 seconds until page loaded
100 RB => one try took 50 seconds, another 110 seconds
200 RB => nothing for 8 minutes, after which my web browser (Firefox) proposed to stop the page since it slowed the browser down, and after 10 minutes I stopped the attempt without ever seeing the page load.

Expected results:

Page should load almost instantly with only a few hundred role bindings

https://github.com/openshift/console/pull/13099

Bug OCPBUGS-4053: container_network* metrics stop reporting after container restart

View the Description View the linked PRs

Description of problem:

container_network* metrics stop reporting after a container restarts. Other container_* metrics continue to report for the same pod.

How reproducible:

Issue can be reproduced by triggering a container restart

Steps to Reproduce:

1.Restart container 
2.Check metrics and see container_network* not reporting

Additional info:
Ticket with more detailed debugging process ~~OHSS-16739~~

https://github.com/openshift/kubernetes/pull/1594

Bug OCPBUGS-13303: exclude openshift-apiserver from health check for SNO static pod

View the Description View the linked PRs

For static pod readiness we check /readyz and /healthz endpoints for kube-apiserver. For SNO exclude openshift-apiserver from the health checks using the 'exclude' query parameter

Example:
> oc get --raw /readyz?verbose&exclude=api-openshift-apiserver-available

Should we also remove 'oauth-apiserver'?

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1492

Bug OCPBUGS-8379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-agent-image/pull/69

Task HOSTEDCP-1112: Create Config File to Control Frequency of RHTAP PRs

View the Description View the linked PRs

As a HyperShift developer, I would like a config file created to control the creation frequency of RHTAP PRs so that the HyperShift repo & CI is not inundated with RHTAP PRs.

https://github.com/openshift/hypershift/pull/2838

Bug OCPBUGS-15927: Error page when fresh normal user visiting BuildConfigs page of 'default' project

View the Description View the linked PRs

Description of problem:

When fresh normal user visit BuildConfigs page of 'default' project, we can see error page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

Always

Steps to Reproduce:

1. normal user without any projects login to console 
2. switch to Admin perspective
3. Visit workloads page for 'default' project, for example
/k8s/ns/default/route.openshift.io~v1~Route
/k8s/ns/default/core~v1~Service
/k8s/ns/default/apps~v1~Deployment
/k8s/ns/default/build.openshift.io~v1~BuildConfig

Actual results:

3. We can see an error page when visiting BuildConfigs page

Expected results:

3. no error should be shown and show consistent info with other workloads page

Additional info:

https://github.com/openshift/console/pull/13091

Bug OCPBUGS-26553: pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading [Backport 4.14]

View the Description View the linked PRs

Description of problem:

pods assigned with Multus whereabouts IP get stuck in ContainerCreating state after OCP upgrading from 4.12.15 to 4.12.22. Not sure if upgrading cause the issue or node rebooting directly cause the issue.

The error message is:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox mypod-0-0-1-0_testproject_8c8500e1-1643-4716-8fd7-e032292c62ab_0(2baa045a1b19291769ed56bab288b60802179ff3138ffe0d16a14e78f9cb5e4f): error adding pod testproject_mypod-0-0-1-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [testproject/mypod-0-0-1-0/8c8500e1-1643-4716-8fd7-e032292c62ab:testproject-net-svc-kernel-bond]: error adding container to network "testproject-net-svc-kernel-bond": error at storage engine: k8s get error: client rate limiter Wait returned an error: rate: Wait(n=1) would exceed context deadline

Version-Release number of selected component (if applicable):

How reproducible:

Not sure if it is reproducible

Steps to Reproduce:

1.
2.
3.

Actual results:

Pods stuck in ContainerCreating state

Expected results:

Pods creates normally

Additional info:

Customer responded deleting statefulset and recreated it didn't work.
The pods can be created normally after deleting corresponding ippools.whereabouts.cni.cncf.io manually
$ oc delete ippools.whereabouts.cni.cncf.io 172.21.24.0-22 -n openshift-multus

https://github.com/openshift/whereabouts-cni/pull/230

Bug OCPBUGS-8068: The node's last_error disappears briefly on cleaning failure

View the Description View the linked PRs

It is caused by the power off routine, which initialises last_error to None. The field is later restored, but BMO manages to observe and record the wrong value.

This issue is not trivial to reproduce in the product. You need ~~OCPBUGS-2471~~ to land first, then you need to trigger the cleaning failure several times. I used direct access to Ironic via CLI to abort cleaning (`baremetal node abort <node name>`) during deprovisioning. After a few attempts you can observe the following in the BMH's status:

status:
  errorCount: 2
  errorMessage: 'Cleaning failed: '
  errorType: provisioning error

The empty message after the colon is a sign of this bug.

https://github.com/openshift/ironic-image/pull/353

Story OCPNODE-1495: [MCO] Set CGroups version via `01-master-kubelet/01-worker-kubelet` mcs

View the Description View the linked PRs

Instead of creating a new MC 97-{master/worker}-generated-kubelet to set the default cgroups version, it is better to set it via a template.

Slack Ref: https://redhat-internal.slack.com/archives/CK1AE4ZCK/p1676313403836509?thread_ts=1676313208.921679&cid=CK1AE4ZCK

https://github.com/openshift/machine-config-operator/pull/3563

Bug OCPBUGS-21784: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13254

Task MGMT-14315: Allow install P and Z architectures with Single Node Openshift

View the Description View the linked PRs

Make SNO dev-preview on 4.13 for P and Z

https://github.com/openshift/assisted-service/pull/5147

Bug OCPBUGS-12233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/535

Bug OCPBUGS-16515: GCP Serial Failing on Simultaneous MachineSet Scaling Test

View the Description View the linked PRs

test=[sig-cluster-lifecycle][Feature:Machines][Serial] Managed cluster should grow and decrease when scaling different machineSets simultaneously [Timeout:30m][apigroup:machine.openshift.io] [Suite:openshift/conformance/serial]

Appears to be perma-failing on gcp serial jobs.

We're at the edge of our visible data, but it looks like this may have happened around July 7

Sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview-serial/1681814026218115072

https://github.com/openshift/installer/pull/7317

Bug OCPBUGS-9274: Ingress-canary daemonset does not tolerate Infra taints NoExecute

View the Description View the linked PRs

Description of problem:
Ingress-canary Daemon Set does not tolerate Infra taint "NoExecute"

Version-Release number of selected component (if applicable):
OCPv4.9

How reproducible:
Always

Steps to Reproduce:
1.Label and Taint Node
$ oc describe node worker-0.cluster49.lab.pnq2.cee.redhat.com | grep infra
Roles: custom,infra,test
node-role.kubernetes.io/infra= <----
Taints: node-role.kubernetes.io/infra=reserved:NoExecute <----
node-role.kubernetes.io/infra=reserved:NoSchedule <----

2.Edit ingress-canary ds and add NoExecute toleration
$ oc get ds -o yaml | grep -i tole -A6
tolerations:

effect: NoSchedule
key: node-role.kubernetes.io/infra
value: reserved
effect: NoExecute <----
key: node-role.kubernetes.io/infra <----
value: reserved <----

3. The Daemon Set configuration gets overwritten after some time, probably by the managing operator, and the pods are terminated on the infra nodes.

Actual results:
Infra taint toleration NoExecute gets overwritten :
$ oc get ds -o yaml | grep -i tole -A6
tolerations:

effect: NoSchedule
key: node-role.kubernetes.io/infra
operator: Exists

Expected results:
Ingress canary Daemon Set should be able to tolerate the NoExecute taint toleration.

Additional info: Same taint as the product documentation are used (node-role.kubernetes.io/infra)

https://github.com/openshift/cluster-ingress-operator/pull/932

Bug OCPBUGS-16076: Validate cluster name when creating from CLI

View the Description View the linked PRs

Description of problem:

hypershift CLI tool allows any string for cluster name. But later when the cluster is to be imported, it needs to confirm to RFC1123.

So the user needs to read the error, destroy the cluster and then try again with a proper name. This experience can be improved.

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

Always

Steps to Reproduce:

1. hypershift create cluster kubevirt --name virt-4.12 ...
2. try to import it

Actual results:

cluster fails to import due to its name

Expected results:

validate the cluster name in the hypershift cli, fail early

Additional info:

https://github.com/openshift/hypershift/pull/2906

Bug OCPBUGS-16445: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1077

Bug OCPBUGS-19899: shortname for FAR Template not correct in console resource badge

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18832~~. The following is the description of the original issue:
—
Description of problem:

console does not enable customizing the abbreviation that appears on the resource icon badge. This causes an issue for the FAR operator with the CRD FenceAgentRemediationTemplate, the badge icon shows FART. The CRD includes a custom short name, but the console ignores it

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. create the CRD (included link to github)
2. navigate to Home -> search
3. Enter far into the Resources filter

Actual results:

The badge FART shows in the dropdown

Expected results:

The badge should show fartemplate - the content of the short name

Additional info:

https://github.com/openshift/console/pull/13204

Bug OCPBUGS-20818: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/309

Bug OCPBUGS-22720: "Only known images used by tests" test flakes on OKD

View the Description View the linked PRs

This is a clone of issue OCPBUGS-22413. The following is the description of the original issue:
—
OKD's sample operator is using a different set of images, specifically for mysql its importing them from quay.io.

So "Only known images used by tests" test from e2e suite frequently fails.

registry.redhat.io/rhel8/mysql-80:latest from pods:
  ns/e2e-test-oc-builds-57lj7 pod/database-1-9zdgz node/ip-10-0-95-30.ec2.internal

https://github.com/openshift/origin/pull/28365

Bug OCPBUGS-16148: There is no warning info when add storage to deployment without setting existing pvc name

View the Description View the linked PRs

Description of problem:

On add storage page, if user choose use existing pvc, but leave the pvc name empty, after other fields are filled, click "Save", there is not warning info about the pvc name field. The loading dot icons are shown under "Save" button.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-12-124310

How reproducible:

Always

Steps to Reproduce:

1.Create a deployment.
2.Click "Add Storage" item in action list of the deployment
3.Choose "Use existing claim", but leave it empty.
4.Set mount dir and click "Save".

Actual results:

4. There is not warning info about the empty pvc name.

Expected results:

4. Should show info for the field:"Please fill out this field"

Additional info:

https://github.com/openshift/console/pull/13010

Bug OCPBUGS-22460: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13288

Bug OCPBUGS-20957: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/239

Bug OCPBUGS-8666: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17545: oc-mirror fails to exec on arm64 cluster with error: bin/usr/bin/registry/opm: exec format error

View the Description View the linked PRs

Description of problem:

oc-mirror fails to  on arm64 platform with error : Rendering catalog image "ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/home/ec2-user/ocmtest/oci-multi-index:1fb06f" with file-based catalog 
Rendering catalog image "ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/redhat/community-operator-index:v4.13" with file-based catalog 
error: error rebuilding catalog images from file-based catalogs: error regenerating the cache for ec2-18-224-73-36.us-east-2.compute.amazonaws.com:5000/arm/redhat/community-operator-index:v4.13: fork/exec /home/ec2-user/ocmtest/oc-mirror-workspace/src/catalogs/registry.redhat.io/redhat/community-operator-index/v4.13/bin/usr/bin/registry/opm: exec format error

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.  Clone the repo to arm64 cluster and build oc-mirror;
2. Copy the catalog index to localhost ;
`skopeo copy --all  --format oci  docker://registry.redhat.io/redhat/redhat-operator-index:v4.13 oci:///home/ec2-user/ocmtest/oci-multi-index  --remove-signatures`
3.  Run the oc-mirror command :
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
archiveSize: 16
mirror:
  operators:
  - catalog: oci:///home/ec2-user/ocmtest/oci-multi-index
    full: false # only mirror the latest versions
    packages:
    - name: cluster-logging
  - catalog: registry.redhat.io/redhat/community-operator-index:v4.13
    full: false # only mirror the latest versions
    packages:
    - name: namespace-configuration-operator
`oc-mirror --config config-413.yaml docker://xxxx:5000/arm --dest-skip-tls`

Expected results:

No errors and succeed

https://github.com/openshift/oc-mirror/pull/676

Bug OCPBUGS-19429: oc-mirror failed with a ImageSetConfiguration yaml containing two EUS channels

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/769

Bug OCPBUGS-11162: openshift-azure-routes triggered continously on rhel9

View the Description View the linked PRs

openshift-azure-routes.path has the following [Path] section:

[Path]
PathExistsGlob=/run/cloud-routes/*
PathChanged=/run/cloud-routes/
MakeDirectory=true

There was a change in systemd that re-checks the files watched with PathExistsGlob once the service finishes:

With this commit, systemd rechecks all paths specs whenever the triggered unit deactivates. If any PathExists=, PathExistsGlob= or DirectoryNotEmpty= predicate passes, the triggered unit is reactivated

This means that openshift-azure-routes will get triggered all the time as long there are files in /run/cloud-routes.

https://github.com/openshift/machine-config-operator/pull/3643

Bug OCPBUGS-21300: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/405

Bug OCPBUGS-27822: 4.14 tech preview jobs failing on prometheus tests

View the Description View the linked PRs

Several of the 4.14 tech preview jobs have been permfailing since January 20th:
https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-techpreview

example failures:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-techpreview/1749740678981619712
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-techpreview/1749740680076333056
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview/1749740673168314368

All the jobs fail in the same way:
fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:459]: possibly some services didn't register ServiceMonitors to allow metrics collection

It seems that some resource must not be getting installed in techpreview clusters which is causing this test to fail.

Unfortunately these jobs failing prevents us from accepting 4.14 payloads, which makes this extremely important to resolve quickly.

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2024-01-20-231201 represents the first payload that showed these failures but I did not see anything obviously suspect in the commit changes between it and the last accepted payload.

The failing tech preview jobs block payload acceptance because this payload acceptance job depends on consuming results from the tech preview jobs: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.14-overall-analysis-all/1749818281608351744

https://github.com/openshift/cluster-version-operator/pull/1028

Bug OCPBUGS-16813: ignition-server-proxy and konnectivity-server components only have one replica when two dedicated nodes are allocated and it is in HA mode

View the Description View the linked PRs

Description of problem:

In HA mode there are two dedicated nodes, ignition-server-proxy and konnectivity-server only have one replica, I expect that they have two replicas, each runs on one dedicated node.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. allocate two dedicated nodes
2. create a cluster in HA mode
3. check ignition-server-proxy and konnectivity-server in control plane

Actual results:

ignition-server-proxy and konnectivity-server have one replica

Expected results:

ignition-server-proxy and konnectivity-server have two replicas, each replica runs on one dedicated node

Additional info:

Bug OCPBUGS-13348: Hypershift Audit configuration not working for Hypershift HostedCluster

View the Description View the linked PRs

Description of problem:

Add Audit configuration for hypershift Hosted Cluster not working as expected.

Version-Release number of selected component (if applicable):

# oc get clusterversions.config.openshift.io
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-05-04-090524   True        False         15m     Cluster version is 4.13.0-0.nightly-2023-05-04-090524

How reproducible:

Always

Steps to Reproduce:

1. Get hypershift hosted cluster detail from management cluster. 

# hostedcluster=$( oc get -n clusters hostedclusters -o json | jq -r .items[].metadata.name)  

2. Apply audit profile for hypershift hosted cluster. 
# oc patch HostedCluster $hostedcluster -n clusters -p '{"spec": {"configuration": {"apiServer": {"audit": {"profile": "WriteRequestBodies"}}}}}' --type merge     
hostedcluster.hypershift.openshift.io/85ea85757a5a14355124 patched 

# oc get HostedCluster $hostedcluster -n clusters -ojson | jq .spec.configuration.apiServer.audit        
{
  "profile": "WriteRequestBodies"
}

3. Check Pod or operator restart to apply configuration changes. 

# oc get pods -l app=kube-apiserver  -n clusters-${hostedcluster}
NAME                              READY   STATUS    RESTARTS   AGE
kube-apiserver-7c98b66949-9z6rw   5/5     Running   0          36m
kube-apiserver-7c98b66949-gp5rx   5/5     Running   0          36m
kube-apiserver-7c98b66949-wmk8x   5/5     Running   0          36m

# oc get pods -l app=openshift-apiserver   -n clusters-${hostedcluster}
NAME                                  READY   STATUS    RESTARTS   AGE
openshift-apiserver-dc4c84ff4-566z9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-99zq9   3/3     Running   0          29m
openshift-apiserver-dc4c84ff4-9xdrz   3/3     Running   0          30m

4. Check generated audit log.
# NOW=$(date -u "+%s"); echo "$NOW"; echo "$NOW" > now
1683711189

# kaspod=$(oc get pods -l app=kube-apiserver -n clusters-${hostedcluster} --no-headers -o=jsonpath={.items[0].metadata.name})                                     

# oc logs $kaspod -c audit-logs -n clusters-${hostedcluster} > kas-audit.log                                                                                      
# cat kas-audit.log | grep -iE '"verb":"(get|list|watch)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0

# cat kas-audit.log | grep -iE '"verb":"(create|delete|patch|update)","user":.*(requestObject|responseObject)' | jq -c 'select (.requestReceivedTimestamp | .[0:19] + "Z" | fromdateiso8601 > '"`cat now`)" | wc -l
0  

All results should not be zero
In backend it should apply the configuration or pod/operator restart after configuration changes.

Actual results:

Config changes not applied in backend.Not operator & pod restart

Expected results:

Configuration should applied and pod & operator should restart after config changes.

Additional info:

https://github.com/openshift/hypershift/pull/2945

Bug OCPBUGS-14784: Hypershift operator should honor 'hostedcluster.spec.configuration.ingress.loadBalancer.platform.aws.type'

View the Description View the linked PRs

Description of problem:

'hostedcluster.spec.configuration.ingress.loadBalancer.platform.aws.type' is ignored

Version-Release number of selected component (if applicable):

How reproducible:

set field to 'NLB'

Steps to Reproduce:

1. set the field to 'NLB'
2.
3.

Actual results:

a classic load balancer is created

Expected results:

Should create a Network load balancer

Additional info:

https://github.com/openshift/hypershift/pull/2669

Bug OCPBUGS-16623: OpenShift Router still sends traffic to its only backend when weight is 0

View the Description View the linked PRs

Description of problem:

per oc set route-backends -h output:
Routes may have one or more optional backend services with weights controlling how much traffic flows to each service.
[...]
**If all weights are zero the route will not send traffic to any backends.**

this is not the case anymore for a route with a single backend.

Version-Release number of selected component (if applicable):

at least from OCP 4.12 onward

How reproducible:

all the time

Steps to Reproduce:

1. kubectl create -f example/
2. kubectl patch route example -p '{"spec":{"to": {"weight": 0}}}' --type merge
3. curl http://localhost -H "Host: example.local"

Actual results:

curl succeeds

Expected results:

curl fails

Additional info:

https://access.redhat.com/support/cases/#/case/03567697

is regression following ~~NE-822~~. Reverting
https://github.com/openshift/router/commit/9656da7d5e2ac0962f3eaf718ad7a8c8b2172cfa makes it work again.

https://github.com/openshift/router/pull/499

Bug OCPBUGS-11361: 4.14 cluster installation failed with TECH_PREVIEW featuregate

View the Description View the linked PRs

Description of problem:

4.14 cluster installation failed with TECH_PREVIEW featuregate

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-03-002631

How reproducible:

Always on GCP and Azure platform

Steps to Reproduce:

1. Install 4.14 cluster  with TECH_PREVIEW featuregate

Actual results:

Cluster Installation failed and shows below error

oc get pod -n openshift-kube-apiserver -l apiserver --show-labels

E0404 18:13:56.266461 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:26.270883 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:56.269363 73688 memcache.go:238] couldn't get current server API group list: Get "https://api.maxu-az-tp1.qe.azure.devcluster.openshift.com:6443/api?timeout=32s": dial tcp 20.253.227.131:6443: i/o timeout

E0404 18:14:58.075111 73688 memcache.go:255] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request

E0404 18:14:58.302392 73688 memcache.go:255] couldn't get resource list for security.openshift.io/v1: the server is currently unable to handle the request

E0404 18:14:58.309541 73688 memcache.go:255] couldn't get resource list for template.openshift.io/v1: the server is currently unable to handle the request

E0404 18:14:58.313497 73688 memcache.go:255] couldn't get resource list for packages.operators.coreos.com/v1: the server is currently unable to handle the request

NAME READY STATUS RESTARTS AGE LABELS

kube-apiserver-maxu-az-tp1-86n5v-master-2 4/5 CrashLoopBackOff 7 (2m41s ago) 16m apiserver=true,app=openshift-kube-apiserver,revision=16

Expected results:

Cluster Installation should be success and not show any error

Additional info:

https://issues.redhat.com/browse/OCPQE-14686

https://drive.google.com/file/d/1EHVuPFaSJA50R2k8uVVUVDvGDCfG9ZYN/view?usp=sharing

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/?job=*4.14*-tp-*
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/?job=*4.14*-techpreview*

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1477

Bug OCPBUGS-12499: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/197

Bug OCPBUGS-16380: create-cluster-and-infraenv service fails on disconnected env

View the Description View the linked PRs

Description of problem:

When using a disconnected env and OPENSHIFT_INSTALL_RELEASE_IMAGE_MIRROR env var is specified, the create-cluster-and-infraenv service fails[*].
Seems that the issue happens due to a missing registries.conf in the assisted-service container, which is required for pulling the image.

[*[
create-cluster-and-infraenv[2784]: level=fatal msg="Failed to register cluster with assisted-service: command 'oc adm release info -o template --template '{{.metadata.version}}' --insecure=true quay.io/openshift-release-dev/ocp-release@sha256:3c050cb52fdd3e65c518d4999d238ec026ef724503f275377fee6bf0d33093ab --registry-config=/tmp/registry-config1560177852' exited with non-zero exit code 1: \nerror: unable to read image quay.io/openshift-release-dev/ocp-release@sha256:3c050cb52fdd3e65c518d4999d238ec026ef724503f275377fee6bf0d33093ab: Get "http://quay.io/v2/\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\n"

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. Add registries.conf with mirror config set to a local registry (e.g. use imageContentSources in install-config)
2. Ensure that a custom release image mirror that refers the registry is set on OPENSHIFT_INSTALL_RELEASE_IMAGE_MIRROR env var.
3. Boot the machine on a disconnected env.

Actual results:

create-cluster-and-infraenv service fails pull the release image.

Expected results:

create-cluster-and-infraenv service should finish successfully.

Additional info:

Pushed a PR to the installer for propagating registries.conf: https://github.com/openshift/installer/pull/7332

We have a workaround in the appliance by overriding the service:
https://github.com/openshift/appliance/pull/94/

https://github.com/openshift/installer/pull/7332

Bug OCPBUGS-21436: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/63

Bug OCPBUGS-23771: [release-4.14] All resources' yaml tab show TypeError after MCE operator is installed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22778~~. The following is the description of the original issue:
—
Description of problem:

After install "multicluster engine for Kubernetes" operator successfully(created required resource), check on any resource's YAML tab, the page shows Type error.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-31-145859

How reproducible:

Always

Steps to Reproduce:

1. Install mce(multicluster engine for Kubernetes) operator from OperatorHub.
2. After the operator installed successfully. Check on any resource's YAML tab.
3.

Actual results:

2. The page shows "Oh no! Something went wrong." with TypeError.

Expected results:

2. YAML tab should show yaml normally.

Additional info:

Screenshot: https://drive.google.com/file/d/1yMmdo40N2l_LtEBMM1s1vkHLoDx6-qRP/view?usp=sharing

https://github.com/openshift/console/pull/13360

Bug OCPBUGS-21737: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-18392: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/777

Bug OCPBUGS-1973: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/341

Bug OCPBUGS-25980: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/72

Bug OCPBUGS-7361: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12644

Bug MGMT-14973: Fix misleading logs showing wrong platform and user_managed_networking combination

View the Description View the linked PRs

Description of the problem:

When patching platform and leaving umn without change the logs shows "false" instead of nil, causing us to think that the cluster will not be in a not valid state (e.g. none + umn disabled)

time="2023-06-15T09:59:54Z" level=info msg="Platform verification completed, setting platform type to none and user-managed-networking to false" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).validateUpdateCluster" file="/assisted-service/internal/bminventory/inventory.go:1928" cluster_id=468bffe8-ce24-400e-a104-b0aab378eb75 go-id=94310 pkg=Inventory request_id=2fbb74ba-4390-4f27-b6fd-ee11ac1a7895

Steps to reproduce:

1. Create cluster with platform == OCI or vSphere with UMN enabled

2. Patch the cluster with "{"platfrom": {"type": "none"}}"

Actual results:

Log shows

setting platform type to none and user-managed-networking to false

Expected results:

setting platform type to none and user-managed-networking to nil

https://github.com/openshift/assisted-service/pull/5298

Bug OCPBUGS-14125: e2e-agnostic-ovn-cmd is permanently failing due to registry.centos.org

View the Description View the linked PRs

Description of problem:

Since registry.centos.org is closed, tests relying on this registry in e2e-agnostic-ovn-cmd job are failing.

Version-Release number of selected component (if applicable):

all

How reproducible:

Trigger e2e-agnostic-ovn-cmd job

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27945

Bug OCPBUGS-16108: Web console DeploymentConfig list page becomes slow when user visit workload page with more than 300 workloads

View the Description View the linked PRs

Description of problem:

Customer is facing issue with console slowness when loading workloads page having 300+ workloads.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

1. Login to OCP console
2. Workloads — > Projects --> Project-> Deployment Configs(300+)
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13118

Bug OCPBUGS-18175: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1953

Bug OCPBUGS-23006: Inline Dockerbuild type doesn't preserve file modified timestamp

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22497~~. The following is the description of the original issue:
—
While trying to develop a demo for a Java application, that first builds using the source-to-image strategy and then uses the resulting image to copy artefacts from the s2i-builder+compiled sources-image to a slimmer runtime image using an inline Dockerfile build strategy on OpenShift, the deployment then fails since the inline Dockerfile hooks doesn't preserve the modification time of the file that gets copied. This is different to how 'docker' itself does it with a multi-stage build.

Version-Release number of selected component (if applicable):

4.12.14

How reproducible:

Always

Steps to Reproduce:

1. git clone https://github.com/jerboaa/quarkus-quickstarts
2. cd quarkus-quickstarts && git checkout ocp-bug-inline-docker
3. oc new-project quarkus-appcds-nok
4. oc process -f rest-json-quickstart/openshift/quarkus_runtime_appcds_template.yaml | oc create -f -

Actual results:

$ oc logs quarkus-rest-json-appcds-4-xc47z
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
Error occurred during initialization of VM
Unable to use shared archive.
An error has occurred while processing the shared archive file.
A jar file is not the one used while building the shared archive file: rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar

Expected results:

Starting the Java application using /opt/jboss/container/java/run/run-java.sh ...
INFO exec -a "java" java -XX:MaxRAMPercentage=80.0 -XX:+UseParallelGC -XX:MinHeapFreeRatio=10 -XX:MaxHeapFreeRatio=20 -XX:GCTimeRatio=4 -XX:AdaptiveSizePolicyWeight=90 -XX:+ExitOnOutOfMemoryError -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=/deployments/app-cds.jsa -Dquarkus.http.host=0.0.0.0 -cp "." -jar /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar 
INFO running in /deployments
__  ____  __  _____   ___  __ ____  ______ 
 --/ __ \/ / / / _ | / _ \/ //_/ / / / __/ 
 -/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \   
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/   
2023-10-27 18:13:01,866 INFO  [io.quarkus] (main) rest-json-quickstart 1.0.0-SNAPSHOT on JVM (powered by Quarkus 3.4.3) started in 0.966s. Listening on: http://0.0.0.0:8080
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Profile prod activated. 
2023-10-27 18:13:01,867 INFO  [io.quarkus] (main) Installed features: [cdi, resteasy-reactive, resteasy-reactive-jackson, smallrye-context-propagation, vertx]

Additional info:

When deploying with AppCDS turned on, then we can get the pods to start and when we then look at the modified file time of the offending file we notice that these differ from the original s2i-merge-image (A) and the runtime image (B):

(A)
$ oc rsh quarkus-rest-json-appcds-s2i-1-x5hct stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 200001h/2097153d	Inode: 60146490    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 18:11:22.000000000 +0000
Modify: 2023-10-27 18:11:22.000000000 +0000
Change: 2023-10-27 18:11:41.555586774 +0000
 Birth: 2023-10-27 18:11:41.491586774 +0000

(B)
$ oc rsh quarkus-rest-json-appcds-1-l7xw2 stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057039  	Blocks: 31368      IO Block: 4096   regular file
Device: 2000a3h/2097315d	Inode: 71601163    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-10-27 18:11:44.000000000 +0000
Modify: 2023-10-27 18:11:44.000000000 +0000
Change: 2023-10-27 18:12:12.169087346 +0000
 Birth: 2023-10-27 18:12:12.114087346 +0000

Both should have 'Modify: 2023-10-27 18:11:22.000000000 +0000'.

When I perform a local s2i build of the same application sources and then use this multi-stage Dockerfile, the modify time of the files remain the same.

FROM quarkus-app-uberjar:ubi9 as s2iimg

FROM registry.access.redhat.com/ubi9/openjdk-17-runtime as final
COPY --from=s2iimg /deployments/* /deployments/
ENV JAVA_OPTS_APPEND="-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -Xshare:on -XX:SharedArchiveFile=app-cds.jsa"

as shown here:

$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-app-uberjar:ubi9 -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 276781319   Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:52:37.352926632 +0000
 Birth: 2023-10-27 15:52:37.288926109 +0000
$ sudo docker run --rm -ti --entrypoint /bin/bash quarkus-cds-app -c 'stat /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar'
  File: /deployments/rest-json-quickstart-1.0.0-SNAPSHOT-runner.jar
  Size: 16057020  	Blocks: 31368      IO Block: 4096   regular file
Device: 6fh/111d	Inode: 14916403    Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  185/ default)   Gid: (    0/    root)
Access: 2023-10-27 15:52:28.000000000 +0000
Modify: 2023-10-27 15:52:28.000000000 +0000
Change: 2023-10-27 15:53:04.408147760 +0000
 Birth: 2023-10-27 15:53:04.346147253 +0000

Both have a modified file time of 2023-10-27 15:52:28.000000000 +0000

https://github.com/openshift/builder/pull/370

Story AUTH-362: KRP: Maintenance Work

View the Description View the linked PRs

What

Address issues and PRs.

In particular:

Make downstream version bump
Merge Standa's Open PR.

Why

A healthy open source repo is being maintained and keeps users.

https://github.com/openshift/kube-rbac-proxy/pull/70

Bug OCPBUGS-11729: VSphereStorageDriver does not document the platform default

View the Description View the linked PRs

Description of problem:

API fields that are defaulted by a controller should document what their default is for each release version.
Currently the field documents that "if empty, subject to platform chosen default", but it does not state what that is.

To fix this, please add, after the platform chosen default prose:
// The current default is XYZ.

This will allow users to track the platform defaults over time from the API documentation.

I would like to see this fixed before 4.13 and 4.14 are released please, it should be pretty quick to fix if we understand what those defaults are.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/360

Bug OCPBUGS-24063: network-node-identity does not honor restart annotation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24062~~. The following is the description of the original issue:
—

https://github.com/openshift/hypershift/pull/3248

Bug OCPBUGS-14025: Improve storage must-gather to collect vSphere CRs

View the Description View the linked PRs

We need to improve our must-gather so as we can collect CRs on which vSphere CSI driver depends.

IMO they contain vital cluster state and not collecting them makes certain part of CSI driver debugging way harder than it needs to be.

https://github.com/openshift/must-gather/pull/363

Bug OCPBUGS-15500: openshift-tests panics when retrieving etcd logs

View the Description View the linked PRs

Description of problem:

Since we migrated some our jobs to OCP 4.14, we are experiencing a lot of flakiness with the "openshift-tests" binary which panics when trying to retrieve the logs of etcd: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2212/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted/1673615526967906304#1:build-log.txt%3A161-191

Here's the impact on our jobs:
https://search.ci.openshift.org/?search=error+reading+pod+logs&maxAge=48h&context=1&type=build-log&name=.*assisted.*&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Happens from time to time against OCP 4.14

Steps to Reproduce:

1. Provision an OCP cluster 4.14
2. Run the conformance tests on it with "openshift-tests"

Actual results:


The binary "openshift-tests" panics from time to time:

 [2023-06-27 10:12:07] time="2023-06-27T10:12:07Z" level=error msg="error reading pod logs" error="container \"etcd\" in pod \"etcd-test-infra-cluster-a1729bd4-master-2\" is not available" pod=etcd-test-infra-cluster-a1729bd4-master-2
[2023-06-27 10:12:07] panic: runtime error: invalid memory address or nil pointer dereference
[2023-06-27 10:12:07] [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x26eb9b5]
[2023-06-27 10:12:07] 
[2023-06-27 10:12:07] goroutine 1 [running]:
[2023-06-27 10:12:07] bufio.(*Scanner).Scan(0xc005954250)
[2023-06-27 10:12:07] 	bufio/scan.go:214 +0x855
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.IntervalsFromPodLogs({0x8d91460, 0xc004a43d40}, {0xc8b83c0?, 0xc006138000?, 0xc8b83c0?}, {0x8d91460?, 0xc004a43d40?, 0xc8b83c0?})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/podlogs.go:130 +0x8cd
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/monitor/intervalcreation.InsertIntervalsFromCluster({0x8d441e0, 0xc000ffd900}, 0xc0008b4000?, {0xc005f88000?, 0x539, 0x0?}, 0x25e1e39?, {0xc11ecb5d446c4f2c, 0x4fb99e6af, 0xc8b83c0}, ...)
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/monitor/intervalcreation/types.go:65 +0x274
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*MonitorEventsOptions).End(0xc001083050, {0x8d441e0, 0xc000ffd900}, 0x1?, {0x7fff15b2ccde, 0x16})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/options_monitor_events.go:170 +0x225
[2023-06-27 10:12:07] github.com/openshift/origin/pkg/test/ginkgo.(*Options).Run(0xc0013e2000, 0xc00012e380, {0x8126d1e, 0xf})
[2023-06-27 10:12:07] 	github.com/openshift/origin/pkg/test/ginkgo/cmd_runsuite.go:506 +0x2d9a
[2023-06-27 10:12:07] main.newRunCommand.func1.1()
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:330 +0x2d4
[2023-06-27 10:12:07] main.mirrorToFile(0xc0013e2000, 0xc0014cdb30)
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:476 +0x5f2
[2023-06-27 10:12:07] main.newRunCommand.func1(0xc0013e0300?, {0xc000862ea0?, 0x6?, 0x6?})
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:311 +0x5c
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).execute(0xc0013e0300, {0xc000862e40, 0x6, 0x6})
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:916 +0x862
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).ExecuteC(0xc0013e0000)
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
[2023-06-27 10:12:07] github.com/spf13/cobra.(*Command).Execute(...)
[2023-06-27 10:12:07] 	github.com/spf13/cobra@v1.6.0/command.go:968
[2023-06-27 10:12:07] main.main.func1(0xc00011b300?)
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:96 +0x8a
[2023-06-27 10:12:07] main.main()
[2023-06-27 10:12:07] 	github.com/openshift/origin/cmd/openshift-tests/openshift-tests.go:97 +0x516

Expected results:

No panics

Additional info:

The source of the panic has been pin-pointed here: https://github.com/openshift/origin/pull/27772#discussion_r1243600596

https://github.com/openshift/origin/pull/28012

Bug OCPBUGS-17812: HyperShift etcd liveness probe should mirror the standalone openshift etcd probe

View the Description View the linked PRs

Description of problem:

etcd pods running in a hypershift control plane use an exec probe to check cluster health and have a very small timeout (1s). We should be using the same as standalone etcd with a 30s timeout

Version-Release number of selected component (if applicable):

All

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift hosted cluster
2. Examine etcd pod(s) yaml

Actual results:

Probe is of type exec and has a timeout of 1s

Expected results:

Probe is of type http and has a timeout of 30s

Additional info:

https://github.com/openshift/hypershift/pull/2918

Bug OCPBUGS-19075: Update 4.14 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/535

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/535

Bug OCPBUGS-6581: Serverless - Eventing - Channels: Conditions column i18n misses

View the Description View the linked PRs

Description of problem:

Serverless -> Eventing -> Channels, Values under conditions column are in Englis.
Translator comments:
"x OK/y" should be translated as "x个 OK(共y个)"

Version-Release number of selected component (if applicable):

4.13.0-ec.1

How reproducible:

always

Steps to Reproduce:

1. Navigate to Serverless -> Eventing -> Channels.
2. Values under Conditions column are in English.
3.

Actual results:

Content is in English.

Expected results:

Content should be in target language. x OK/y" should be translated as "x个 OK(共y个)"

Additional info:

screenshot provided

https://github.com/openshift/console/pull/12641

Bug OCPBUGS-18363: Regression issue: '/etc/cni/multus' is not mounted in multus-thick

View the Description View the linked PRs

Marko Luksa mentioned multus missing '/etc/cni/multus/net.d' mount in OCP4.14 and here's the repro step (verivied in multus team)

Our original reproducer would be too complex, so I had to write a simple one for you:
Use a 4.14 OpenShift cluster
Create the CNI plugin installer DaemonSet in namespace test:

oc apply -f https://gist.githubusercontent.com/luksa/c4d444e918124604839c424339c29a62/raw/1454bd389138980ea3f93bcfaf6026d4821e3543/noop-cni-plugin-installer.yaml

Create the test Deployment:

oc apply -f https://gist.githubusercontent.com/luksa/4c7c144ef88b1b0d8f772d6eacdeec14/raw/06b161fdb8c71406f4531d35550bd507a6a25200/test-deployment.yaml

Describe the test pod:

oc -n test describe po test

The last event shows the following:

ERRORED: error configuring pod [test/test-6cf67dcfb6-hgszq] networking: Multus: [test/test-6cf67dcfb6-hgszq/3e8a6f0d-ce84-4885-a7a7-43506669339f]: error loading k8s delegates k8s args: TryLoadPodDelegates: error in getting k8s network for pod: GetNetworkDelegates: failed getting the delegate: GetCNIConfig: err in GetCNIConfigFromFile: No networks found in /etc/cni/multus/net.d

The same reproducer runs fine on OCP 4.13

https://github.com/openshift/cluster-network-operator/pull/1979

Bug OCPBUGS-22770: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7657

Bug OCPBUGS-10588: Custom build strategy cannot add configmaps as build input

View the Description View the linked PRs

Description of problem:

If we add a configmap to buildconfig as build input, the configmap data is not present at the destnationDir on the build pod.

Version-Release number of selected component (if applicable):

How reproducible:

Follow below steps to reproduce.

Steps to Reproduce:

1. Create a configmap to pass as build input

apiVersion: v1
data:
  settings.xml: |+
    xxx
    yyy
kind: ConfigMap
metadata:
  name: build-test
  namespace: test

2. Create a buidlconfig like below

apiVersion: build.openshift.io/v1
kind: BuildConfig
metadata:
  labels:
    app: custom-build
  name: custom-build
spec:
  source:
    configMaps:
    - configMap:
        name: build-test
      destinationDir: /tmp
    type: None
  output:
    to:
      kind: ImageStreamTag
      name: custom-build:latest
  postCommit: {}
  runPolicy: Serial
  strategy:
    customStrategy:
      from:
        kind: "DockerImage"
        name: "registry.redhat.io/rhel8/s2i-base"

 3. start a new build

    oc start-build custom-build

 4. As per the documentation[a] the configmap data should present on the build pod location "/var/run/secrets/openshift.io/build" if we didn't explicitly mention the "destinationDir". in above example "destinationDir" set to "/tmp" so "server.xml" file from the configmap should present in "/tmp" directory of the build pod.
 
[a] https://docs.openshift.com/container-platform/4.12/cicd/builds/creating-build-inputs.html#builds-custom-strategy_creating-build-inputs

Actual results:

Configmap data is not present on the "destinationDir" or in default location "/var/run/secrets/openshift.io/build"

Expected results:

Configmap data should be present on the destinationDir of the builder pod.

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/254

Bug OCPBUGS-16726: [4.14] don't enforce PSa in 4.14

View the Description View the linked PRs

Description of problem:

We shouldn't enforce PSa in 4.14, neither by label sync, neither by global cluster config.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

As a cluster admin:
1. create two new namespaces/projects: pokus, openshift-pokus
2. as a cluster-admin, attempt to create a privileged pod in both the namespaces from 1.

Actual results:

pod creation is blocked by pod security admission

Expected results:

only a warning about pod violating the namespace pod security level should be emitted

Additional info:

https://github.com/openshift/cluster-config-operator/pull/354

Bug OCPBUGS-21350: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-cloud-csi-driver/pull/36

Story OTA-941: Require forcing to get the cluster-version operator to accept rollbacks

View the Description View the linked PRs

We have occasional cases where admins attempt a rollback, despite long-standing docs:

Only upgrading to a newer version is supported. Reverting or rolling back your cluster to a previous version is not supported. If your update fails, contact Red Hat support.

Deeper history for that content here, here, and here. We could refuse to accept rollbacks unless the administrator sets Force to waive our guards.

https://github.com/openshift/cluster-version-operator/pull/918

Bug MGMT-14040: Data collection is not working while config is enabled

View the Description View the linked PRs

Description of problem:

I have deployed multicluster-engine.v2.3.0-81 with spoke cLuster 4.12.ec5

In the assisted pod I see data collection is enabled:
sh-4.4$ env | grep DATA
DATA_UPLOAD_ENDPOINT=https://console.redhat.com/api/ingress/v1/upload
ENABLE_DATA_COLLECTION=True

But : in AI logs I see "Event uploading is not enabled"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

deployed multicluster-engine.v2.3.0-81 with spoke cLuster 4.12.ec5

check the logs and env vars in pod
...

Actual results:

in AI logs I see "Event uploading is not enabled"

Expected results:

Data should be uploaded

Additional info:

https://github.com/openshift/assisted-service/pull/5054

Bug OCPBUGS-14859: CPO doesn't skip AWS resource deletion for 'Unknown' OIDC state

View the Description View the linked PRs

Description of problem:

When the OIDC provider is deleted on the customer side, AWS resource deletion is not skipped in cases that the ValidAWSIdentityProvider state is on 'Unknown'.

This results in clusters being stuck during deletion.

Version-Release number of selected component (if applicable):

4.12.z, 4.13.z, 4.14.z

How reproducible:

Irregular

Steps to Reproduce:

1.
2.
3.

Actual results:

Cluster stuck in uninstallation

Expected results:

Clusters not stuck in uninstallation, AWS customer resources being skipped for removal

Additional info:

Added MG for all hypershift related NS

Bug seems to be at https://github.com/openshift/hypershift/pull/2281/files#diff-f90ab1b32c9e1b349f04c32121d59f5e9081ccaf2be490f6782165d2960bc6c7R295 : 'Unknown' needs to be added to the check if OIDC is valid or not.

https://github.com/openshift/hypershift/pull/2691

Bug OCPBUGS-15594: Invalid Arch Chosen for cluster-config-operator

View the Description View the linked PRs

Description of problem:

Arm HCP's are currently broken. The following error message was observed in the ignition-server pod:

{"level":"error","ts":"2023-06-29T13:38:19Z","msg":"Reconciler error","controller":"secret","controllerGroup":"","controllerKind":"Secret","secret":{"name":"token-brcox-hypershift-arm-us-east-1a-dbe0ce2a","namespace":"clusters-brcox-hypershift-arm"},"namespace":"clusters-brcox-hypershift-arm","name":"token-brcox-hypershift-arm-us-east-1a-dbe0ce2a","reconcileID":"ff813140-d10a-464e-a1b0-c05859b64ef9","error":"error getting ignition payload: failed to execute cluster-config-operator: cluster-config-operator process failed: /bin/bash: line 21: /payloads/get-payload1590526115/bin/cluster-config-operator: cannot execute binary file: Exec format error\n: exit status 126","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal...

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an Arm Mgmt Cluster
2. Create an Arm HCP

Actual results:

Error message in ignition-server pod and failure to generate appropriate payload.

Expected results:

ignition-server picks the appropriate arch based on the mgmt cluster.

Additional info:

https://github.com/openshift/hypershift/pull/2753

Bug OCPBUGS-18987: dev console, silence alert, alert state is changed from Silenced to Firing quickly

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18485~~. The following is the description of the original issue:
—
Description of problem:

developer console, go to "Observe -> openshift-moniotring -> Alerts", silence Watchdog alert, at the first, the alert state is Silenced in Alerts tab, but changed to Firing quickly(the alert is silenced actually), see the attached screen shoot

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-02-132842

How reproducible:

always

Steps to Reproduce:

1. silence alert in the dev console, and check alert state in Alerts tab
2.
3.

Actual results:

alert state is changed from Silenced to Firing quickly

Expected results:

state should be Silenced

https://github.com/openshift/console/pull/13152

Bug OCPBUGS-2177: SNO node is not marked as degraded when pool selection is failed

View the Description View the linked PRs

Description of problem:

machine config pool selection will be failed when single node has master+custom roles, controller logged the error but node is not marked as degraded, end user does not know this error. no config can be applied on the node

Version-Release number of selected component (if applicable):

4.12. 4.11.z

Steps to Reproduce:

1. setup SNO cluster
2. create custom mcp
3. add custom mcp label on the node
4. check mcc pod log to see the error message about pool selection 
5. create mc to apply config

Actual results:

node state is good, the single node cannot be assigned to any mcp

Expected results:

node can be marked as degraded with error message

Additional info:

https://github.com/openshift/machine-config-operator/pull/3505

Bug OCPBUGS-12637: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12767

Bug OCPBUGS-22172: network-tools throwing errors on --help

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22166~~. The following is the description of the original issue:
—
Description of problem:

network-tools -h
error: You must be logged in to the server (Unauthorized)
error: You must be logged in to the server (Unauthorized)
Usage: network-tools [command]

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/network-tools/pull/94

Bug OCPBUGS-23027: KAS HSTS is not configured on Hypershift control planes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23015~~. The following is the description of the original issue:
—
https://github.com/openshift/cluster-kube-apiserver-operator/pull/1392

configured HSTS for the KAS in standalone and we need to follow

https://github.com/openshift/hypershift/pull/3169

Bug OCPBUGS-16959: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7373

Task HOSTEDCP-987: Update go version and dependencies in /hack/tools/go.mod

View the Description View the linked PRs

As a developer, I would like to make sure we are using the latest versions of the dependencies we utilize in the /hack/tools/go.mod file.

https://github.com/openshift/hypershift/pull/2551

Bug OCPBUGS-13132: Update 4.14 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1137

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1146

Bug OCPBUGS-5833: Changing a PreprovisioningImage ImageURL and/or ExtraKernelParams should reboot the host

View the Description View the linked PRs

Description of problem:

Altering the ImageURL or ExtraKernelParams values in a PreprovisioningImage CR should cause the host to boot using the new image or parameters, but currently the host doesn't respond at all to changes in those fields.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-11-225449

How reproducible:

Always

Steps to Reproduce:

1. Create a BMH
2. Set preprovisioning image image URL
3. Allow host to boot
4. Change image URL or extra kernel params

Actual results:

Host does not reboot

Expected results:

Host reboots using the newly provided image or parameters

Additional info:
BMH:

- apiVersion: metal3.io/v1alpha1
  kind: BareMetalHost
  metadata:
    annotations:
      inspect.metal3.io: disabled
    creationTimestamp: "2023-01-13T16:06:12Z"
    finalizers:
    - baremetalhost.metal3.io
    generation: 4
    labels:
      infraenvs.agent-install.openshift.io: myinfraenv
    name: ostest-extraworker-0
    namespace: assisted-installer
    resourceVersion: "61077"
    uid: 444d7246-3d0a-4188-a8c4-f407ee4f741f
  spec:
    automatedCleaningMode: disabled
    bmc:
      address: redfish+http://192.168.111.1:8000/redfish/v1/Systems/6f45ba9f-251a-46f7-a7a8-10c6ca9231dd
      credentialsName: ostest-extraworker-0-bmc-secret
    bootMACAddress: 00:b2:71:b8:14:4f
    customDeploy:
      method: start_assisted_install
    online: true
  status:
    errorCount: 0
    errorMessage: ""
    goodCredentials:
      credentials:
        name: ostest-extraworker-0-bmc-secret
        namespace: assisted-installer
      credentialsVersion: "44478"
    hardwareProfile: unknown
    lastUpdated: "2023-01-13T16:06:22Z"
    operationHistory:
      deprovision:
        end: null
        start: null
      inspect:
        end: null
        start: null
      provision:
        end: null
        start: "2023-01-13T16:06:22Z"
      register:
        end: "2023-01-13T16:06:22Z"
        start: "2023-01-13T16:06:12Z"
    operationalStatus: OK
    poweredOn: false
    provisioning:
      ID: b5e8c1a9-8061-420b-8c32-bb29a8b35a0b
      bootMode: UEFI
      image:
        url: ""
      raid:
        hardwareRAIDVolumes: null
        softwareRAIDVolumes: []
      rootDeviceHints:
        deviceName: /dev/sda
      state: provisioning
    triedCredentials:
      credentials:
        name: ostest-extraworker-0-bmc-secret
        namespace: assisted-installer
      credentialsVersion: "44478"

Preprovisioning Image (with changes)

- apiVersion: metal3.io/v1alpha1
  kind: PreprovisioningImage
  metadata:
    creationTimestamp: "2023-01-13T16:06:22Z"
    generation: 1
    labels:
      infraenvs.agent-install.openshift.io: myinfraenv
    name: ostest-extraworker-0
    namespace: assisted-installer
    ownerReferences:
    - apiVersion: metal3.io/v1alpha1
      blockOwnerDeletion: true
      controller: true
      kind: BareMetalHost
      name: ostest-extraworker-0
      uid: 444d7246-3d0a-4188-a8c4-f407ee4f741f
    resourceVersion: "56838"
    uid: 37f4da76-0d1c-4e05-b618-2f0ab9d5c974
  spec:
    acceptFormats:
    - initrd
    architecture: x86_64
  status:
    architecture: x86_64
    conditions:
    - lastTransitionTime: "2023-01-13T16:34:26Z"
      message: Image has been created
      observedGeneration: 1
      reason: ImageCreated
      status: "True"
      type: Ready
    - lastTransitionTime: "2023-01-13T16:06:24Z"
      message: Image has been created
      observedGeneration: 1
      reason: ImageCreated
      status: "False"
      type: Error
    extraKernelParams: coreos.live.rootfs_url=https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/boot-artifacts/rootfs?arch=x86_64&version=4.12
      rd.break=initqueue
    format: initrd
    imageUrl: https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/images/79ef3924-ee94-42c6-96c3-2d784283120d/pxe-initrd?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI3OWVmMzkyNC1lZTk0LTQyYzYtOTZjMy0yZDc4NDI4MzEyMGQifQ.YazOZS01NoI7g_eVhLmRNmM6wKVVaZJdWbxuePia46Fo0GMLYtSOp1JTvtcStoT51g7VkSnTf8LBJ0zmbGu3HQ&arch=x86_64&version=4.12
    kernelUrl: https://assisted-image-service-assisted-installer.apps.ostest.test.metalkube.org/boot-artifacts/kernel?arch=x86_64&version=4.12
    networkData: {}

This was found while testing ZTP so in this case the assisted-service controllers are altering the preprovisioning image in response to changes made in the assisted-specific CRs, but I don't think this issue is ZTP specific.

https://github.com/openshift/baremetal-operator/pull/270

Bug OCPBUGS-9972: Azure; NLB; OVN-K: Requests from CNI pods to internalAPI server domain fails intermittently

View the Description View the linked PRs

Description of problem:

OpenShift Container Platform 4.12.5 installation with IPI installation method on Microsoft Azure is showing undesired behavior when trying to curl "https://api.<clustername>.<domain>:6443/readyz". When using `HostNetwork` it all works without any issues. But when doing the same request from a pod that does not have `HostNetwork` capabilties and therefore has an IP from the SDN range, a big portion of the requests is failing.

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.5    True        False         29m     Cluster version is 4.12.5

$ oc get network cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Network
metadata:
  creationTimestamp: "2023-03-10T13:12:06Z"
  generation: 2
  name: cluster
  resourceVersion: "2975"
  uid: e1e9c464-526c-4ebf-ab84-0deedf092cac
spec:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  externalIP:
    policy: {}
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
status:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  clusterNetworkMTU: 1400
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16

$ oc get infrastructure cluster -o yaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-10T13:12:04Z"
  generation: 1
  name: cluster
  resourceVersion: "430"
  uid: 5c260276-d901-40f7-a28c-172c492e81e6
spec:
  cloudConfig:
    key: config
    name: cloud-provider-config
  platformSpec:
    type: Azure
status:
  apiServerInternalURI: https://api-int.clustername.domain.lab:6443
  apiServerURL: https://api.clustername.domain.lab:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: sreberazure-njj24
  infrastructureTopology: HighlyAvailable
  platform: Azure
  platformStatus:
    azure:
      cloudName: AzurePublicCloud
      networkResourceGroupName: sreberazure-njj24-rg
      resourceGroupName: sreberazure-njj24-rg
    type: Azure

$ oc project openshift-apiserver
Already on project "openshift-apiserver" on server "https://api.clustername.domain.lab:6443".
$ oc get pod
NAME                         READY   STATUS    RESTARTS   AGE
apiserver-6f58784797-kq4kr   2/2     Running   0          41m
apiserver-6f58784797-l69jr   2/2     Running   0          38m
apiserver-6f58784797-nn6tn   2/2     Running   0          45m

$ oc get pod -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
apiserver-6f58784797-kq4kr   2/2     Running   0          42m   10.130.0.21   sreberazure-njj24-master-0   <none>           <none>
apiserver-6f58784797-l69jr   2/2     Running   0          38m   10.129.0.29   sreberazure-njj24-master-2   <none>           <none>
apiserver-6f58784797-nn6tn   2/2     Running   0          45m   10.128.0.36   sreberazure-njj24-master-1   <none>           <none>

$ oc rsh apiserver-6f58784797-l69jr
Defaulted container "openshift-apiserver" out of: openshift-apiserver, openshift-apiserver-check-endpoints, fix-audit-permissions (init)
sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
curl: (28) Connection timed out after 1000 milliseconds
okokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1003 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
^C
sh-4.4# exit
exit
command terminated with exit code 130

$ oc project openshift-kube-apiserver
Now using project "openshift-kube-apiserver" on server "https://api.clustername.domain.lab:6443".

$ oc get pod -o wide
NAME                                              READY   STATUS      RESTARTS   AGE   IP            NODE                         NOMINATED NODE   READINESS GATES
apiserver-watcher-sreberazure-njj24-master-0      1/1     Running     0          55m   10.0.0.6      sreberazure-njj24-master-0   <none>           <none>
apiserver-watcher-sreberazure-njj24-master-1      1/1     Running     0          57m   10.0.0.8      sreberazure-njj24-master-1   <none>           <none>
apiserver-watcher-sreberazure-njj24-master-2      1/1     Running     0          57m   10.0.0.7      sreberazure-njj24-master-2   <none>           <none>
installer-2-sreberazure-njj24-master-2            0/1     Completed   0          51m   10.129.0.27   sreberazure-njj24-master-2   <none>           <none>
installer-3-sreberazure-njj24-master-2            0/1     Completed   0          50m   10.129.0.32   sreberazure-njj24-master-2   <none>           <none>
installer-4-sreberazure-njj24-master-2            0/1     Completed   0          49m   10.129.0.36   sreberazure-njj24-master-2   <none>           <none>
installer-5-sreberazure-njj24-master-2            0/1     Completed   0          46m   10.129.0.15   sreberazure-njj24-master-2   <none>           <none>
installer-6-sreberazure-njj24-master-0            0/1     Completed   0          37m   10.130.0.27   sreberazure-njj24-master-0   <none>           <none>
installer-6-sreberazure-njj24-master-1            0/1     Completed   0          39m   10.128.0.45   sreberazure-njj24-master-1   <none>           <none>
installer-6-sreberazure-njj24-master-2            0/1     Completed   0          36m   10.129.0.37   sreberazure-njj24-master-2   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-0   1/1     Running     0          37m   10.130.0.29   sreberazure-njj24-master-0   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-1   1/1     Running     0          38m   10.128.0.47   sreberazure-njj24-master-1   <none>           <none>
kube-apiserver-guard-sreberazure-njj24-master-2   1/1     Running     0          50m   10.129.0.31   sreberazure-njj24-master-2   <none>           <none>
kube-apiserver-sreberazure-njj24-master-0         5/5     Running     0          37m   10.0.0.6      sreberazure-njj24-master-0   <none>           <none>
kube-apiserver-sreberazure-njj24-master-1         5/5     Running     0          38m   10.0.0.8      sreberazure-njj24-master-1   <none>           <none>
kube-apiserver-sreberazure-njj24-master-2         5/5     Running     0          34m   10.0.0.7      sreberazure-njj24-master-2   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-0      0/1     Completed   0          33m   10.130.0.35   sreberazure-njj24-master-0   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-1      0/1     Completed   0          33m   10.128.0.56   sreberazure-njj24-master-1   <none>           <none>
revision-pruner-6-sreberazure-njj24-master-2      0/1     Completed   0          33m   10.129.0.39   sreberazure-njj24-master-2   <none>           <none>

$ oc rsh kube-apiserver-sreberazure-njj24-master-1
sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok

Also changing  `--connect-timeout 1` from curl to `--connect-timeout 10` for example does not have any impact. It simply takes longer until the timeout is reached.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12 (also previous version were not tested)

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.12 on Azure using IPI install method and set the SDN to OVN-Kubernetes
2. Once successfully installed run `oc project openshift-apiserver`
3. rsh apiserver-<podID>
4. while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done

Actual results:

sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
curl: (28) Connection timed out after 1000 milliseconds
okokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1003 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds
okokokokokokokokokcurl: (28) Connection timed out after 1001 milliseconds
okokcurl: (28) Connection timed out after 1001 milliseconds
curl: (28) Connection timed out after 1001 milliseconds

Expected results:

sh-4.4# while true; do curl -k --connect-timeout 1  https://api.clustername.domain.lab:6443/readyz; sleep 1; done
okokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokokok

Additional info:

https://github.com/openshift/machine-config-operator/pull/3878

Bug OCPBUGS-12507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/node_exporter/pull/128

Bug OCPBUGS-17156: Fix for dnf-RHEL worker nodes breaks 4.12 -> 4.13 upgrades badly

View the Description View the linked PRs

One of the 4.13 nightly payload test is failing and it seems like kernel-uname-r is needed in base RHCOS.

Error message from rpm-ostree rebase made

 Problem: package kernel-modules-core-5.14.0-284.25.1.el9_2.x86_64 requires kernel-uname-r = 5.14.0-284.25.1.el9_2.x86_64, but none of the providers can be installed
  - conflicting requests

MCD pod log: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade/1686324400581775360/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-bjhq4_machine-config-daemon.log

Perhaps something changed recently in packaging.

https://github.com/openshift/machine-config-operator/pull/3832

Bug OCPBUGS-18428: vSphere Dual-stack IPI not waiting for IPv6 address for KUBELET_NODE_IPS

View the Description View the linked PRs

Description of problem:

vSphere dual-stack added support for both IPv4 and IPv6 in kubelet --node-ip
however the masters are booting without the IPv6 address in --node-ip

"Ignoring filtered route {Ifindex: 2 Dst: <nil> Src: 192.168.130.19 Gw: 192.168.130.1 Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: 192.168.130.0/24 Src: 192.168.130.19 Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: fd65:a1a8:60ad:271c::22/128 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: fe80::/64 Src: <nil> Gw: <nil> Flags: [] Table: 254}"
"Ignoring filtered route {Ifindex: 2 Dst: <nil> Src: <nil> Gw: fe80::9eb4:f9fa:2b8d:8372 Flags: [] Table: 254}"

"Writing Kubelet service override with content [Service]\nEnvironment=\"KUBELET_NODE_IP=192.168.130.19\" \"KUBELET_NODE_IPS=192.168.130.19\"\n"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-28-154013

How reproducible:

Intermittent (DHCPv6 related)

Steps to Reproduce:

1. install vsphere dual-stack IPI with DHCPv6


networking:
  clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    - cidr: fd65:10:128::/56
      hostPrefix: 64
  machineNetwork:
    - cidr: 192.168.0.0/16
    - cidr: fd65:a1a8:60ad:271c::/64
  networkType: OVNKubernetes

Actual results:

Masters missing IPv6 address in KUBELET_NODE_IPS

Install fails with

time="2023-08-30T19:54:19Z" level=error msg="failed to initialize the cluster: Cluster operators authentication, console, ingress, monitoring are not available"

Expected results:

Both IPv4 and IPv6 address in KUBELET_NODE_IPS

Install succeeds

Additional info:

Do we set ipv6.may-fail with NetworkManager?

https://github.com/openshift/installer/pull/7467

Bug OCPBUGS-22895: ARO builds should not generate azure-cloud-provider credentials in Manual mode

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22113~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

4.14.0 and 4.15.0

How reproducible:

Every time.

Steps to Reproduce:

1. git clone https://github.com/openshift/installer.git
2. export TAGS=aro
3. hack/build.sh
4. export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="${RELEASE_IMAGE}"
5. export OPENSHIFT_INSTALL_INVOKER="ARO"
6. Run ccoctl to generate ID resources
7. ./openshift-install create manifests
8. ./openshift-install create cluster --log-level=debug

Actual results:

azure-cloud-provider gets generated with aadClientId = service principal clientID used by the installer.

Expected results:

This step should be skipped and kube-controller-manager should rely on file assets.

Additional info:

Open pull request: https://github.com/openshift/installer/pull/7608

https://github.com/openshift/installer/pull/7670

Bug OCPBUGS-8231: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2243

Bug OCPBUGS-13142: InstallPlan info cannot shown on Subscription tab for the user who has project admin permission

View the Description View the linked PRs

Description of problem:

After the changes of OCPBUGS-3036 and OCPBUGS-11596, the user who has project admin permision would be able to check all the subscription information on the operaotor details page. But currently the installPlan infromation will shown "None" in the page wich is incorrect

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-03-163151

How reproducible:

Always

Steps to Reproduce:

1. Configure IDP. add a user
2. Install any operator in specific namespace 
3. Assign project admin permission to the user for the same namespace
   $ oc adm policy add-role-to-user admin <username> -n <projectname>
4. Check user have enough permission to check installplan via CLI
   $ oc get clusterrole admin -o yaml | grep -C10 installplan
     - apiGroups:
       - operators.coreos.com
       resources:
       - clusterserviceversions
       - catalogsources
       - installplans
       - subscriptions
       verbs:
       - delete
     - apiGroups:
       - operators.coreos.com
       resources:
       - clusterserviceversions
       - catalogsources
       - installplans
       - subscriptions
       - operatorgroups
       verbs:
       - get
       - list
       - watch
4. Login OCP with the user, and go to InstallPlan page, user is able to check the InstallPlan list without any error
   /k8s/ns/<projectname>/operators.coreos.com~v1alpha1~InstallPlan
5. Navigate to OperatorDetails -> Subscription Tab, check if the 'InstallPlan' name could be shown on page

Actual results:

Only 'None' is shown on the InstallPlan section

Expected results:

The installplan name can be shown on the subsctiption page

Additional info:

https://github.com/openshift/console/pull/13012

Bug OCPBUGS-13648: Installation of a non-latest operator version doesn't show correct install state

View the Description View the linked PRs

This PR will allow the installation of non-latest Operator channels and associated versions. https://github.com/openshift/console/pull/12743

When I version is installed that is not the `currentCSV` default version for a channel, The data returns `installed: false` and `installed state: "Not Installed"`

So the UI doesn't place an "Installed" label on the operator card in OperatorHub and the user doesn't see that it's already installed when viewing the operator details.

Version-Release number of selected component (if applicable):

4.14 cluster

Steps to Reproduce:

In OperatorHub select Data Grid operator and install version 8.4.3.
Once installed, go into OperatorHub and select Data Grid operator card. Note there isn't an "Installed" label on card.
Select the Data Grid card, once open is should have a show that the operator is installed with a link to the installed version.

Animated screen gif of installed Data Grid version 8.4.3, the default latest version is 8.4.4

https://drive.google.com/file/d/1KVMCdflBYsI3yiLf2oQv69MoStgA5kof/view?usp=sharing

Actual results:

obj data returns `installState: "Not Installed" and `installed: false`

Expected results:

obj data returns `installState: "Installed" and `installed: true`

Additional info:

Requires 4.14 cluster to support installing previous versions and channels

https://github.com/openshift/console/pull/12743

Bug OCPBUGS-17981: Remove DeploymentConfig, Build and BuildConfig sections from navigation and use flags so they can be enabled by cluster admins

View the Description View the linked PRs

There is a workloads change, which is introducing DeploymentConfigs and Builds API as a capabilities, which gives the cluster admin option to enable/disable each of their API.

In case the DeploymentConfigs capability is disabled we should remove the `Deployment Config` subsection from `Workloads` nav section.

In case the Builds capability is disabled we should remove the `Builds` and `Build Configs` subsection from `Workloads` nav section.

Use console extensions to check for if DeploymentConfigs, Builds and BuildConfigs are available on the cluster and set appropriate flags which will serve as requirement for console.navigation/resource-ns and console.action/resource-provider

https://github.com/openshift/console/pull/13089

Bug OCPBUGS-18608: Windows kubelet build is broken

View the Description View the linked PRs

Description of problem:

UPSTREAM: <carry>: Force using host go always and use host libriaries introduced a build failure for the Windows kubelet that is showing up only in release-4.11 for an unknown reason but could potentially occur on other releases too.

Version-Release number of selected component (if applicable):

WMCO version: 9.0.0 and below

How reproducible:

Always on release-4.11

Steps to Reproduce:

1. Clone the WMCO repo
2. Build the WMCO image

Actual results:

WMCO image build fails

Expected results:

 WMCO image build should succeed

https://github.com/openshift/kubernetes/pull/1688

Bug OCPBUGS-5027: CNO status reporting

View the Description View the linked PRs

Description of problem:

While investigating issue [1] we've noticed a few problems with CNO error reporting on the ClusterOperator status [2]:

that's fine, but I think there are a couple bugs to write up:
1. when a panic happens, the operator doesnt' go degraded. This can definitely be done
2. when status cannot be updated, the operator should go degraded
3. when service network and/or clusternetwork in status is missing, the operator should go Available=false.

[1] https://github.com/openshift/cluster-network-operator/pull/1669
[2] https://coreos.slack.com/archives/CB48XQ4KZ/p1671207248527519?thread_ts=1671197854.825529&cid=CB48XQ4KZ

Version-Release number of selected component (if applicable):

 4.13 and previous.

How reproducible:

 Always

Steps to Reproduce:

1. Cause a deliberate panic e.g. in the bootstrap code.

Actual results:

 Operator keeps getting restarted and is not Degraded.

Expected results:

 Operator goes Degraded.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1786

Bug OCPBUGS-6465: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/16

Bug OCPBUGS-17681: clusteroperator/network is degraded because DaemonSet /openshift-multus/dhcp-daemon rollout is not making progress - pod dhcp-daemon-* is in CrashLoopBackOff State

View the Description View the linked PRs

Description of problem:

clusteroperator/network is degraded after running

    FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci

from openshift-kni/cnf-features-deploy against IPI clusters with OCP 4.13 and 4.14 in CI jobs from Telco 5G DevOps/CI.

Details for a 4.13 job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/42141/rehearse-42141-periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g/1689935408508440576

Details for a 4.14 job:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/42141/rehearse-42141-periodic-ci-openshift-release-master-nightly-4.14-e2e-telco5g/1689935408541995008

For example, got to artifacts/e2e-telco5g/telco5g-gather-pao/build-log.txt and it will report:

Error from server (BadRequest): container "container-00" in pod "cnfdu5-worker-0-debug" is waiting to start: ContainerCreating
Running gather-pao for T5CI_VERSION=4.13
Running for CNF_BRANCH=master
Running PAO must-gather with tag pao_mg_tag=4.12
[must-gather      ] OUT Using must-gather plug-in image: quay.io/openshift-kni/performance-addon-operator-must-gather:4.12-snapshot
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: 60503edf-ecc6-48f7-b6a6-f4dc34842803
ClusterVersion: Stable at "4.13.0-0.nightly-2023-08-10-021434"
ClusterOperators:
	clusteroperator/network is degraded because DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-7lmlq is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-95tzb is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-hfxkd is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-mhwtp is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - pod dhcp-daemon-q7gfb is in CrashLoopBackOff State
DaemonSet "/openshift-multus/dhcp-daemon" rollout is not making progress - last change 2023-08-11T10:54:10Z

Version-Release number of selected component (if applicable):

branch release-4.13 from https://github.com/openshift-kni/cnf-features-deploy.git for OCP 4.13
branch master from https://github.com/openshift-kni/cnf-features-deploy.git for OCP 4.14

How reproducible:

Always.

Steps to Reproduce:

1. Install OCP 4.13 or OCP 4.14 with IPI on 3x masters, 2x workers.
2. Clone https://github.com/openshift-kni/cnf-features-deploy.git
3. FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci
4. oc wait nodes --all --for=condition=Ready=true --timeout=10m
5. oc wait clusteroperators --all --for=condition=Progressing=false --timeout=10m

Actual results:

See above.

Expected results:

All clusteroperators have finished progressing.

Additional info:

Without 'FEATURES_ENVIRONMENT="ci" make feature-deploy-on-ci' the steps to reproduce above work as expected.

https://github.com/openshift/containernetworking-plugins/pull/116

Bug OCPBUGS-19553: The file permission for pod specification files of the kube-apiserver should be updated to 600 to conform with CIS benchmarks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-16796~~. The following is the description of the original issue:
—
Description of problem:

Observation from CISv1.4 pdf:
1.1.1 Ensure that the API server pod specification file permissions are set to 600 or more restrictive



“Ensure that the API server pod specification file has permissions of 600 or more restrictive.
OpenShift 4 deploys two API servers: the OpenShift API server and the Kube API server. The OpenShift API server delegates requests for Kubernetes objects to the Kube API server.
The OpenShift API server is managed as a deployment. The pod specification yaml for openshift-apiserver is stored in etcd.
The Kube API Server is managed as a static pod. The pod specification file for the kube-apiserver is created on the control plane nodes at /etc/kubernetes/manifests/kube-apiserver-pod.yaml. The kube-apiserver is mounted via hostpath to the kube-apiserver pods via /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml with permissions 600.”
 
To conform with CIS benchmarksChange, the pod specification file for the kube-apiserver /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml  files should be updated to 600.

$ for i in $( oc get pods -n openshift-kube-apiserver -l app=openshift-kube-apiserver -o name )
do                 
oc exec -n openshift-kube-apiserver $i -- \
stat -c %a /etc/kubernetes/static-pod-resources/kube-apiserver-pod.yaml
done
644
644
644

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

The permission of the pod specification file for the kube-apiserver is 644.

Expected results:

The permission of the pod specification file for the kube-apiserver should be updated to 600.

Additional info:

PR: https://github.com/openshift/library-go/commit/19a42d2bae8ba68761cfad72bf764e10d275ad6e

Bug OCPBUGS-13532: Host can get stuck on inspecting if the network secret is updated

View the Description View the linked PRs

Description of problem:

It seems that we don't correctly update the network data secret version in the PreprovisioningImage, resulting in BMO assuming that the image is still stale, while the image-customization-controller assumes it's done. As a result, the host is stuck in inspecting.

How reproducible:

What I think I did is to add a network data secret to a host which already has a preprovisioningimage previously created. I need to check if I can repeat it.

Actual results:

Host in inspecting, BMO logs show

{"level":"info","ts":"2023-05-11T11:52:52.348Z","logger":"controllers.BareMetalHost","msg":"network data in pre-provisioning image is out of date","baremetalhost":"openshift-machine-api/oste
st-extraworker-0","provisioningState":"inspecting","latestVersion":"9055823","currentVersion":"9055820"}

Indeed, the image has the old version:

status:
  architecture: x86_64
  conditions:
  - lastTransitionTime: "2023-05-11T11:27:51Z"
    message: Generated image
    observedGeneration: 1
    reason: ImageSuccess
    status: "True"
    type: Ready
  - lastTransitionTime: "2023-05-11T11:27:51Z"
    message: ""
    observedGeneration: 1
    reason: ImageSuccess
    status: "False"
    type: Error
  format: iso
  imageUrl: http://metal3-image-customization-service.openshift-machine-api.svc.cluster.local/231b39d5-1b83-484c-9096-aa87c56a222a
  networkData:
    name: ostest-extraworker-0-network-config-secret
    version: "9055820"

What I find puzzling is that we even have two versions of the secret. I only created it once.

Bug OCPBUGS-19862: Multus annotation permissions: CNO should configure 24h cert for multus [backport 4.14]

View the Description View the linked PRs

Description of problem: Multus currently implements a certificate that exists for 10 minutes, we need to add configuration for certificates for 24 hours

https://github.com/openshift/cluster-network-operator/pull/2040

Bug OCPBUGS-10153: Update 4.14 ose-kube-storage-version-migrator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/192

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-kube-storage-version-migrator/pull/194

Bug OCPBUGS-11921: [gcp] IPI installation to a shared VPC with 'credentialsMode: Manual' failed, due to no IAM service accounts for control-plane machines and compute machines

View the Description View the linked PRs

Description of problem:

IPI installation to a shared VPC with 'credentialsMode: Manual' failed, due to no IAM service accounts for control-plane machines and compute machines

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-04-18-005127

How reproducible:

Always

Steps to Reproduce:

1. "create install-config", and then insert interested settings in install-config.yaml
2. "create manifests"
3. run "ccoctl" to create the required credentials
4. grant the above IAM service accounts the required permissions in the host project (see https://github.com/openshift/openshift-docs/pull/58474)
5. "create cluster"

Actual results:

The installer doesn't create the 2 IAM service accounts, one for control-plane machine and another for compute machine, so that no compute machine getting created, which leads to installation failure.

Expected results:

The installation should succeed.

Additional info:

FYI https://issues.redhat.com/browse/OCPBUGS-11605
$ gcloud compute instances list --filter='name~jiwei-0418'
NAME                        ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP  STATUS
jiwei-0418a-9kvlr-master-0  us-central1-a  n2-standard-4               10.0.0.62                 RUNNING
jiwei-0418a-9kvlr-master-1  us-central1-b  n2-standard-4               10.0.0.58                 RUNNING
jiwei-0418a-9kvlr-master-2  us-central1-c  n2-standard-4               10.0.0.29                 RUNNING
$ gcloud iam service-accounts list --filter='email~jiwei-0418'
DISPLAY NAME                                                     EMAIL                                                                DISABLED
jiwei-0418a-14589-openshift-image-registry-gcs                   jiwei-0418a--openshift-i-zmwwh@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-machine-api-gcp                      jiwei-0418a--openshift-m-5cc5l@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-cloud-credential-operator-gcp-ro-creds         jiwei-0418a--cloud-crede-p8lpc@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-gcp-ccm                              jiwei-0418a--openshift-g-bljz6@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-ingress-gcp                          jiwei-0418a--openshift-i-rm4vz@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-cloud-network-config-controller-gcp  jiwei-0418a--openshift-c-6dk7g@openshift-qe.iam.gserviceaccount.com  False
jiwei-0418a-14589-openshift-gcp-pd-csi-driver-operator           jiwei-0418a--openshift-g-pjn24@openshift-qe.iam.gserviceaccount.com  False
$

https://github.com/openshift/installer/pull/7117

Bug OCPBUGS-13113: Metric for control plane upgrade time

View the Description View the linked PRs

Description of problem:
Control plane upgrades takes about 23 minutes on average. The shortest time I saw was 14 minutes, and the longest is 43 minutes.
The requirement is < 10 min for a successful complete control plane upgrade.

Version-Release number of selected component (if applicable): 4.12.12

How reproducible:
100 %

Steps to Reproduce:

1. Install a hosted cluster on 4.12.12. Wait for it to be 'ready'.
2. Upgrade the control plane to 4.12.13 via OCM.

Actual results: upgrade completes on average after 23 minutes.

Expected results: upgrade completes after < 10 min

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2566

Story HOSTEDCP-944: Add more conditions to hypershift_hostedclusters_failure_conditions

View the Description View the linked PRs

DoD:

Add more conditions to hypershift_hostedclusters_failure_conditions so metrics provide more info

https://github.com/openshift/hypershift/pull/2347

Bug OCPBUGS-18249: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1699

Bug OCPBUGS-20522: [4.14] AgentClusterInstall changes on load aren't respected

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19444~~. The following is the description of the original issue:
—
When we Load the AgentClusterInstall manifest from disk, we sometimes make changes to it.

e.g. after the fix for ~~OCPBUGS-7495~~ we rewrite any lowercase platform name to mixed case, because for a while we required lowercase even when mixed case is correct.

In 4.14, we set the userManagedNetworking to true when platform:none is used, even if the user didn't specify it in the ZTP manifests, because the controller in ZTP similarly defaults it.

However, these changes aren't taking effect, because they aren't passed through to the manifest that is included in the Agent ISO.

https://github.com/openshift/installer/pull/7588

Bug OCPBUGS-12972: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2628

Bug OCPBUGS-13829: tokenConfig's accessTokenInactivityTimeout fields doesn't work in hypershift guest cluster

View the Description View the linked PRs

Description of problem:

The configured accessTokenInactivityTimeout under tokenConfig in HostedCluster doesn't have any effect.
1. The value is not getting updated in oauth-openshift configmap 
2. hostedcluster allows user to set accessTokenInactivityTimeout value < 300s, where as in master cluster the value should be > 300s.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install a fresh 4.13 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Check the hcp:
$ oc get hcp -oyaml
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...

4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Actual results:

1. hostedcluster will allow user to set the value < 300s for accessTokenInactivityTimeout which is not possible on master cluster.

2. The value is not updated in oauth-openshift configmap:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-25785 
...
      tokenConfig:
        accessTokenMaxAgeSeconds: 86400
        authorizeTokenMaxAgeSeconds: 300
...

3. Login doesn't fail even if the user is not active for more than the set accessTokenInactivityTimeout seconds.

Expected results:

Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

https://github.com/openshift/hypershift/pull/2693

Bug OCPBUGS-22240: Console login flow forgot query parameters / Deeplinking doesn't work

View the Description View the linked PRs

Description of problem:
The RHDP-Developer/DXP team wants to deep-link some catalog pages with a filter on the Developer Sandbox cluster. The target page was shown without any query parameter when the user wasn't logged in.

Version-Release number of selected component (if applicable):
At least 4.13 (Dev Sandbox clusters run 4.13.13 currently.)

How reproducible:
Always when not logged in

Steps to Reproduce:

Login
Switch to Developer perspective
Navigate to Add > Developer Catalog > Builder Images > Add filter for ".NET" (for example)
1. Users are applied to different clusters, so the exact URL isn't known, but the Path and Query parameters should look like this:
```
/catalog/ns/cjerolim-dev?catalogType=BuilderImage&keyword=.NET
```
2. Save the full URL incl. these query parameters
Logout
Enter the full URL from above
Login

Actual results:
The Developer Catalog is opened, but the catalog type "Build Images" and keyword filter ".NET" are not applied.

All Developer Catalog items are shown.

Expected results:
The Developer Catalog should open with the catalog type "Build Images" and the keyword filter ".NET" applied.

Exactly one catalog item should be shown.

Additional info:

https://github.com/openshift/console/pull/13270

Bug MGMT-14904: Ignition override max size returns error 500

View the Description View the linked PRs

Description of the problem:

Staging , Ignition override test was passing successfully before , looks like in latest code the returned api code exception changed to 500 (internal server error) .

Before that we have error 400 api code exception.

(Pdb++) cluster.patch_discovery_ignition(ignition=ignition_override)
 'image_type': None,
 'kernel_arguments': None,
 'proxy': None,
 'pull_secret': None,
 'ssh_authorized_key': None,
 'static_network_config': None}     (/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py:169)
*** assisted_service_client.rest.ApiException: (500)
Reason: Internal Server Error
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'vary': 'Accept-Encoding,Origin', 'date': 'Sun, 11 Jun 2023 04:26:53 GMT', 'content-length': '141', 'x-envoy-upstream-service-time': '1538', 'server': 'envoy', 'set-cookie': 'bd0de3dae0f495ebdb32e3693e2b9100=de3a34d29f1e78d0c404b6c5e84b502b; path=/; HttpOnly; Secure; SameSite=None'})
HTTP response body: {"code":"500","href":"","id":500,"kind":"Error","reason":"The ignition archive size (365 KiB) is over the maximum allowable size (256 KiB)"}
Traceback (most recent call last):
  File "/home/benny/assisted-test-infra/src/assisted_test_infra/test_infra/helper_classes/cluster.py", line 501, in patch_discovery_ignition
    self._infra_env.patch_discovery_ignition(ignition_info=ignition)
  File "/home/benny/assisted-test-infra/src/assisted_test_infra/test_infra/helper_classes/infra_env.py", line 116, in patch_discovery_ignition
    self.api_client.patch_discovery_ignition(infra_env_id=self.id, ignition_info=ignition_info)
  File "/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py", line 407, in patch_discovery_ignition
    self.update_infra_env(infra_env_id=infra_env_id, infra_env_update_params=infra_env_update_params)
  File "/home/benny/assisted-test-infra/src/service_client/assisted_service_api.py", line 170, in update_infra_env
    self.client.update_infra_env(infra_env_id=infra_env_id, infra_env_update_params=infra_env_update_params)
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api/installer_api.py", line 1696, in update_infra_env
    (data) = self.update_infra_env_with_http_info(infra_env_id, infra_env_update_params, **kwargs)  # noqa: E501
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api/installer_api.py", line 1767, in update_infra_env_with_http_info
    return self.api_client.call_api(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 325, in call_api
    return self.__call_api(resource_path, method,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 157, in __call_api
    response_data = self.request(
                    ^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/api_client.py", line 383, in request
    return self.rest_client.PATCH(url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/rest.py", line 289, in PATCH
    return self.request("PATCH", url,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.0/lib/python3.11/site-packages/assisted_service_client/rest.py", line 228, in request
    raise ApiException(http_resp=r)
(Pdb++)

How reproducible:

Always

Steps to reproduce:

Run test:
test_discovery_ignition_exceed_size_limit
Actual results:

Returns error 500

Expected results:

erorr 400

https://github.com/openshift/assisted-service/pull/5291

Task MON-988: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2071

Bug OCPBUGS-10138: Update 4.14 ose-cluster-cloud-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/235

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/235

Bug OCPBUGS-16513: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-storage-version-migrator-operator/pull/91

Bug OCPBUGS-18181: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-16776~~.

https://github.com/openshift/installer/pull/7517

Bug OCPBUGS-19499: [4.14] Avoid caching etcdctl on cluster-backup.sh

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19052~~. The following is the description of the original issue:
—
Description of problem:

With OCPBUGS-18274 we had to update the etcdctl binary. Unfortunately the script does not attempt to update the binary if it's found in the path already:

https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/etcd-common-tools#L16-L24

This causes confusion as the binary might not be the latest that we're shipping with etcd.

Pulling the binary shouldn't be a big deal, etcd is running locally anyway and the local image should be cached already just fine. We should always replace the binary

Version-Release number of selected component (if applicable):

any currently supported release

How reproducible:

always

Steps to Reproduce:

1. run cluster-backup.sh to download the binary
2. update the etcd image (take a different version or so)
3. run cluster-backup.sh again

Actual results:

cluster-backup.sh will simply print "etcdctl is already installed"

Expected results:

etcdctl should always be pulled

Additional info:

Bug OCPBUGS-10048: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22257: Remove wildfly docker.io samples

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22225~~. The following is the description of the original issue:
—
Samples operator in OKD refers to docker.io/openshift/wildfly, which are no longer available. Library sync should update samples to use quay.io links

https://github.com/openshift/cluster-samples-operator/pull/520

Bug OCPBUGS-8478: TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job

View the Description View the linked PRs

The cluster-kube-apiserver-operator CI has been constantly failing for the past week and more specifically the e2e-gcp-operator job because the test cluster ends in a state where a lot of requests start failing with "Unauthorized" errors.

This caused multiple operators to become degraded and tests to fail.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1450/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1631333936435040256

Looking at the failures and a must-gather we were able to capture inside of a test cluster, it turned out that the service account issuer could be the culprit here. Because of that we opened https://issues.redhat.com/browse/API-1549.

However, it turned that disabling TestServiceAccountIssuer didn't resolve the issue and the cluster was still too unstable for the tests to pass.

In a separate attempt we also tried disabling TestBoundTokenSignerController and this time the tests were passing. However, the cluster was still very unstable during the e2e run and the kube-apiserver-operator went degraded a couple of times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1455/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1632871645171421184/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5cf9d4569-m2spq_kube-apiserver-operator.log.

On top of that instead of seeing Unauthorized errors, we are now seeing a lot of connection refused.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1455

Bug OCPBUGS-10647: multus-admission-controller should not run as root under Hypershift-managed CNO

View the Description View the linked PRs

Description of problem:

Cluster Network Operator managed component multus-admission-controller does not conform to Hypershift control plane expectations.

When CNO is managed by Hypershift, multus-admission-controller must run with non-root security context. If Hypershift runs control plane on kubernetes (as opposed to Openshift) management cluster, it adds pod or container security context to most deployments with runAsUser clause inside.

In Hypershift CPO, the security context of deployment containers, including CNO, is set when it detects that SCC's are not available, see https://github.com/openshift/hypershift/blob/9d04882e2e6896d5f9e04551331ecd2129355ecd/support/config/deployment.go#L96-L100. In such a case CNO should do the same, set security context for its managed deployment multus-admission-controller to meet Hypershift standard.

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift using Kube management cluster
2.Check pod security context of multus-admission-controller

Actual results:

no pod security context is set

Expected results:

pod security context is set with runAsUser: xxxx

Additional info:

This is the highest priority item from https://issues.redhat.com/browse/OCPBUGS-7942 and it needs to be fixed ASAP as it is a security issue preventing IBM from releasing Hypershift-managed Openshift service.

https://github.com/openshift/cluster-network-operator/pull/1745

Bug OCPBUGS-13963: Bump cluster-ingress-operator to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/cluster-ingress-operator vendors Kubernetes 1.26 packages. OpenShift 4.13 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/cluster-ingress-operator/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.
controller-runtime will need to be bumped to 1.15 as well

https://github.com/openshift/cluster-ingress-operator/pull/936

Bug OCPBUGS-14940: No datapoints found for Long Running Requests by Resource and Long Running Requests by Instance of "API Performance" dashboard

View the Description View the linked PRs

Description of problem:

No datapoints found for Long Running Requests by Resource and Long Running Requests by Instance of "API Performance" dashboard on web-console UI

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

always

Steps to Reproduce:

1.Installed OCP cluster with 4.14 nightly payload
2.Open the web-console, view the page "API Performance" dashboard on web-console UI

Actual results:

1.On the Long Running Requests by Resource and Long Running Requests by Instance page, shows No datapoints found

Expected results:

2.Should show something on Long Running Requests by Resource and Long Running Requests by Instance pages.

Additional info:

1. Got the same results on 4.13.
2. Not found the apiserver_longrunning_gauge in prometheus data, only apiserver_longrunning_requests

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep apiserver_longrunning_gauge
no result

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/__name__/values' | jq | grep apiserver_long
    "apiserver_longrunning_requests",

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1511

Bug OCPBUGS-19558: [release-4.14] pod latency metric is not working

View the Description View the linked PRs

Description of problem:

The current openshift_sdn_pod_operations_latency metrics is broken which is not calculating actual duration of setup/teardown for the latency metric.
We also need additional metrics to measure the pod latency from end to end so that it gives overall summary for total processing time spent by cni server.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/sdn/pull/577

Bug OCPBUGS-19776: 404: not found will shonw on Knative-serving Details page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18267~~. The following is the description of the original issue:
—
Description of problem:

'404: Not Found' will show on Knative-serving Details page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Go to Knative-serving Details page, check if 404 not found message is there
3.

Actual results:

Page will show 404 not found

Expected results:

the 404 not found page should not show

Additional info:

the dependency ticket is OCPBUGs-15008, more information could be checked in the comment

https://github.com/openshift/console/pull/13193

Bug OCPBUGS-7267: [AUTH-262 epic story] [Enhancement] Modify the PSa pod extractor to mutate pod controller pod specs

View the Description View the linked PRs

Description of problem:

When creating a pod controller (e.g. deployment) with pod spec that will be mutated by SCCs, the users might still get a warning about the pod not meeting given namespace pod security level.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. create a namespace with restricted PSa warning level (the default)
2. create a deployment with a pod with an empty security context

Actual results:

You get a warning about the deployment's pod not meeting the NS's pod security admission requirements.

Expected results:

No warning if the pod for the deployment would be properly mutated by SCCs in order to fulfill the NS's pod security requirements.

Additional info:

originally implemented as a part of https://issues.redhat.com/browse/AUTH-337

https://github.com/openshift/kubernetes/pull/1482

Bug OCPBUGS-10333: Workload annotation missing from deployments

View the Description View the linked PRs

Description of problem:

Missing workload annotations from deployments. This is in relation to the openshift/platform-operator repo.

Missing annotations.

Namespace name, `workload.openshift.io/allowed: management`

`target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'`. That annotation is required for the admission webhook to modify the resource for workload pinning. 

Related Enhancements: 
https://github.com/openshift/enhancements/pull/703 
https://github.com/openshift/enhancements/pull/1213

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/platform-operators/pull/82

Task IR-389: Bump aws-sdk-go

View the Description View the linked PRs

New regions are added all the time, so it's best to keep it up-to-date.

https://github.com/openshift/cluster-image-registry-operator/pull/860

Bug OCPBUGS-17477: "e2e-aws-ovn-serial" CI job fails on "[sig-auth][Feature:OAuthServer] [RequestHeaders] [IdP] test RequestHeaders IdP" test: expected "403 Forbidden", got "401 Unauthorized"

View the Description View the linked PRs

Description of problem

CI is flaky because of test failures such as the following:

{  fail [github.com/openshift/origin/test/extended/oauth/requestheaders.go:218]: full response header: HTTP/1.1 403 Forbidden
Content-Length: 192
Audit-Id: f6026f9b-06c5-4b4a-9414-8dc5c681b45a
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Content-Type: application/json
Date: Tue, 08 Aug 2023 11:26:35 GMT
Expires: 0
Pragma: no-cache
Referrer-Policy: strict-origin-when-cross-origin
X-Content-Type-Options: nosniff
X-Dns-Prefetch-Control: off
X-Frame-Options: DENY
X-Xss-Protection: 1; mode=block

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"forbidden: User \"system:anonymous\" cannot get path \"/metrics\"","reason":"Forbidden","details":{},"code":403}


Expected
    <string>: 403 Forbidden
to contain substring
    <string>: 401 Unauthorized
Ginkgo exit error 1: exit with code 1}

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_openshift-apiserver/380/pull-ci-openshift-openshift-apiserver-master-e2e-aws-ovn-serial/1688848417708576768. Search.ci has other similar failures.

Version-Release number of selected component (if applicable)

I have seen this in 4.14 CI jobs and 4.13 CI jobs.

How reproducible

Presently, search.ci shows the following stats for the past 14 days:

Found in 2.41% of runs (4.36% of failures) across 1078 total runs and 58 jobs (55.38% failed)
pull-ci-openshift-openshift-apiserver-master-e2e-aws-ovn-serial (all) - 25 runs, 40% failed, 20% of failures match = 8% impact
openshift-cluster-network-operator-1874-nightly-4.14-e2e-aws-ovn-serial (all) - 42 runs, 67% failed, 14% of failures match = 10% impact
pull-ci-openshift-kubernetes-master-e2e-aws-ovn-serial (all) - 59 runs, 54% failed, 6% of failures match = 3% impact
pull-ci-openshift-origin-master-e2e-aws-ovn-serial (all) - 434 runs, 66% failed, 2% of failures match = 1% impact
pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-serial (all) - 55 runs, 49% failed, 7% of failures match = 4% impact
pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-serial (all) - 60 runs, 58% failed, 3% of failures match = 2% impact
pull-ci-operator-framework-operator-marketplace-master-e2e-aws-ovn-serial (all) - 24 runs, 38% failed, 22% of failures match = 8% impact
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-serial (all) - 81 runs, 58% failed, 4% of failures match = 2% impact
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-ovn-serial (all) - 35 runs, 46% failed, 13% of failures match = 6% impact
rehearse-41872-pull-ci-openshift-ovn-kubernetes-release-4.14-e2e-aws-ovn-serial (all) - 1 runs, 100% failed, 100% of failures match = 100% impact
periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-serial (all) - 72 runs, 49% failed, 3% of failures match = 1% impact
pull-ci-openshift-cluster-kube-apiserver-operator-release-4.13-e2e-aws-ovn-serial (all) - 4 runs, 75% failed, 33% of failures match = 25% impact
pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-serial (all) - 19 runs, 63% failed, 8% of failures match = 5% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check search.ci using the link above.

Actual results

CI fails.

Expected results

CI passes, or fails on some other test failure.

https://github.com/openshift/origin/pull/28161

Bug OCPBUGS-20480: SNO failed upgrade (4.13-> 4.14) because console operator is not available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19080~~. The following is the description of the original issue:
—
Description of problem:

Attempted upgrade of 3480 SNOs that were deployed from 4.13.11 to 4.14.0-rc.0 and 15 SNOs ended up stuck in partial upgrade because the cluster console operator was not available

# cat 4.14.0-rc.0-partial.console | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00255 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00320 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00327 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00405 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm00705 version   4.13.11   True   True   21h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01224 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01310 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01320 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm01928 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02052 version   4.13.11   True   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02588 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm02704 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console
vm02835 version   4.13.11   True   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03110 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator console is not available
vm03322 version   4.13.11   True   True   15h   Unable to apply 4.14.0-rc.0: wait has exceeded 40 minutes for these operators: console

Version-Release number of selected component (if applicable):

SNO OCP (managed clusters being upgraded) 4.13.11 upgraded to 4.14.0-rc.0
Hub OCP 4.13.12
ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 3489 SNos being upgraded however represented 15 out of the 41 partial upgrade failures group (~36% of the failures)

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/797

Bug OCPBUGS-21349: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/484

Bug OCPBUGS-10122: Update 4.14 ose-aws-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-aws/pull/459

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-aws/pull/459

Bug OCPBUGS-19461: ovn-ipsec pods CLBO when IPSec NS extension/svc is enabled

View the Description View the linked PRs

Description of problem:

ovn-ipsec pods Crashes when IPSec NS extension/svc is enabled on any $ROLE nodes

IPSec ext and svc were enabled for 2 WORKERS only and their corresponding ovn-ipsec pods are in CLBO


[root@dell-per740-36 ipsec]# oc get pods 
NAME                                       READY   STATUS             RESTARTS         AGE
dell-per740-14rhtsengpek2redhatcom-debug   1/1     Running            0                3m37s
ovn-ipsec-bptr6                            0/1     CrashLoopBackOff   26 (3m58s ago)   130m
ovn-ipsec-bv88z                            1/1     Running            0                3h5m
ovn-ipsec-pre414-6pb25                     1/1     Running            0                3h5m
ovn-ipsec-pre414-b6vzh                     1/1     Running            0                3h5m
ovn-ipsec-pre414-jzwcm                     1/1     Running            0                3h5m
ovn-ipsec-pre414-vgwqx                     1/1     Running            3                132m
ovn-ipsec-pre414-xl4hb                     1/1     Running            3                130m
ovn-ipsec-qb2bj                            1/1     Running            0                3h5m
ovn-ipsec-r4dfw                            1/1     Running            0                3h5m
ovn-ipsec-xhdpw                            0/1     CrashLoopBackOff   28 (116s ago)    132m
ovnkube-control-plane-698c9845b8-4v58f     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-nlgs8     2/2     Running            0                3h5m
ovnkube-control-plane-698c9845b8-wfkd4     2/2     Running            0                3h5m
ovnkube-node-l6sr5                         8/8     Running            27 (66m ago)     130m
ovnkube-node-mj8bs                         8/8     Running            27 (75m ago)     132m
ovnkube-node-p24x8                         8/8     Running            0                178m
ovnkube-node-rlpbh                         8/8     Running            0                178m
ovnkube-node-wdxbg                         8/8     Running            0                178m
[root@dell-per740-36 ipsec]#

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-12-024050

How reproducible:

Always

Steps to Reproduce:

1.Install OVN IPSec cluster (East-West) 
2.Enable IPSec OS extension for North-South
3.Enable IPSec service for North-South

Actual results:

ovn-ipsec pods in CLBO state

Expected results:

All pods under ovn-kubernetes ns should be Running fine

Additional info:

One of the ovn-ipsec CLBO pods logs

# oc logs ovn-ipsec-bptr6
Defaulted container "ovn-ipsec" out of: ovn-ipsec, ovn-keys (init)
+ rpm --dbpath=/usr/share/rpm -q libreswan
libreswan-4.9-4.el9_2.x86_64
+ counter=0
+ '[' -f /etc/cni/net.d/10-ovn-kubernetes.conf ']'
+ echo 'ovnkube-node has configured node.'
ovnkube-node has configured node.
+ ip x s flush
+ ip x p flush
+ ulimit -n 1024
+ /usr/libexec/ipsec/addconn --config /etc/ipsec.conf --checkconfig
+ /usr/libexec/ipsec/_stackmanager start
+ /usr/sbin/ipsec --checknss
+ /usr/libexec/ipsec/pluto --leak-detective --config /etc/ipsec.conf --logfile /var/log/openvswitch/libreswan.log
FATAL ERROR: /usr/libexec/ipsec/pluto: lock file "/run/pluto/pluto.pid" already exists
leak: string logger, item size: 48
leak: string logger prefix, item size: 27
leak detective found 2 leaks, total size 75

journalctl -u ipsec here: https://privatebin.corp.redhat.com/?216142833d016b3c#2Es8ACSyM3VWvwi85vTaYtSx8X3952ahxCvSHeY61UtT

https://github.com/openshift/cluster-network-operator/pull/2014

Bug OCPBUGS-27116: [regression] increased etcd leader elections significantly impacting vsphere amd64 platform

View the Description View the linked PRs

This is a clone of issue OCPBUGS-27094. The following is the description of the original issue:
—
Description of problem:

Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1720630313664647168

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1719915053026643968

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721475601161785344

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1724202075631390720

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721927613917696000

These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.

Since this particular platform's business significance is high, I'm setting this as "Critical" severity.

Please get in touch with me or Dean West if more teams need to be pulled into investigation and mitigation.

Version-Release number of selected component (if applicable):

4.15 / master

How reproducible:

Component Readiness Board

Actual results:

The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.

Expected results:

1. We NEED to understand what is causing this problem.
2. If we can mitigate this, we should.
3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem.
4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.

Additional info:

https://github.com/openshift/oauth-apiserver/pull/97

Bug OCPBUGS-8381: Console shows x509 error when requesting token from oauth endpoint

View the Description View the linked PRs

Derscription of problem:

On a hypershift cluster that has public certs for OAuth configured, the console reports a x509 certificate error when attempting to display a token

Version-Release number of selected component (if applicable):

4.12.z

How reproducible:

always

Steps to Reproduce:

1. Create a hosted cluster configured with a letsencrypt certificate for the oauth endpoint.
2. Go to the console of the hosted cluster. Click on the user icon and get token.

Actual results:

The console displays an oauth cert error

Expected results:

The token displays

Additional info:

The hcco reconciles the oauth cert into the console namespace. However, it is only reconciling the self-signed one and not the one that was configured through .spec.configuration.apiserver of the hostedcluster. It needs to detect the actual cert used for oauth and send that one.

https://github.com/openshift/hypershift/pull/2279

Bug MGMT-14416: [Staging] [UI] - In networking page - activating DHCP and then trying to switch to UMN throws an error

View the Description View the linked PRs

Description of the problem:

In Staging, UI 2.18.6 - Enable DHCP and then switch to UMN --> BE response "User Managed Networking cannot be set with VIP DHCP Allocation"

How reproducible:

100%

Steps to reproduce:

1. In networking page - enable DHCP

2. Switch to UMN

3. BE response with "User Managed Networking cannot be set with VIP DHCP Allocation"

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5209

Bug OCPBUGS-20025: Keepalived pods crashes and fail to start on worker node (Ingress VIP)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18771~~. The following is the description of the original issue:
—
Description of problem:

Customer reported that keepalived pods crashes and fail to start on worker node (Ingress VIP). The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start. This affects everyone using OCP v4.13 together with Ingress VIP and could be a potential bug in the nodeip-configuration service in v4.13.

More details as below:

-> There are 2 problems in OCP v4.13. The regexp expression won't match and the chroot command will fail because of missing ldd libraries inside the container. This has been fixed on 4.14, but not on 4.13.

-> The nodeip-configuration service creates the /run/nodeip-configuration/remote-worker file based on onPremPlatformAPIServerInternalIPs (apiVIP) and ignores the onPremPlatformIngressIPs (ingressVIP) as can be seen in source code.

-> Then the keepalived process wont start because the remote-worker file exists.

-> The liveness probes will fail because the keepalived process does not exist.

The fix is quite simple(as highlighted by the customer), The nodeip-configuration.service template needs to be to extended to consider the Ingress VIPs as well. This is the source code where changes need to be done

As per the following code snippet, The NODE-IP ranges only over the onPremPlatformAPIServerInternalIPs and ignores the onPremPlatformIngressIPs.

node-ip \
    set \
    --platform {{ .Infra.Status.PlatformStatus.Type }} \
    {{if not (isOpenShiftManagedDefaultLB .) -}}
    --user-managed-lb \
    {{end -}}
    {{if or (eq .IPFamilies "IPv6") (eq .IPFamilies "DualStackIPv6Primary") -}}
    --prefer-ipv6 \
    {{end -}}
    --retry-on-failure \
    {{ range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \
    do \
    sleep 5; \
    done"

Difference between OCPv 4.12 and v4.13 related to keepalived pod is also indicated in this image attached

Version-Release number of selected component (if applicable):

v4.13

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The keepalived pods crashes and fail to start on worker node (Ingress VIP)

Expected results:

The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3951

Bug OCPBUGS-27485: package-server-manager forbidden securityContext.seLinuxOptions: type "spc_t"

View the Description View the linked PRs

Description of problem:

Upgrading OCP from 4.11 to 4.12 with Datadog installed is stuck due to SCC.

The SCC contains:

seLinuxContext:
seLinuxOptions:
  level: s0
  role: system_r
  type: spc_t
  user: system_u
type: MustRunAs


And the error shown is:
~~~
deployment openshift-operator-lifecycle-manager/package-server-manager has a replica failure FailedCreate: pods "package-server-manager-12a3b4cd5e-1x2y3" is forbidden: violates PodSecurity "restricted:v1.24": seLinuxOptions (pod set forbidden securityContext.seLinuxOptions: type "spc_t"; user may not be set; role may not be set)
~~~

Version-Release number of selected component (if applicable):

4.11

How reproducible:

Upgrading a 4.11 cluster with Datadog installed. The SCC contains:

seLinuxContext:
seLinuxOptions:
  level: s0
  role: system_r
  type: spc_t
  user: system_u
type: MustRunAs

Steps to Reproduce:

1. Upgrade a 4.11 cluster to 4.12 with Datadog installed, or an SCC with above `seLinuxOptions`

Actual results:

Upgrade is stuck.

Expected results:

The Datadog SCC (or customer's custom SCCs) should not affect cluster upgrades.

Additional info:

Related KCS [1] [2].

[1] https://access.redhat.com/solutions/7027371
[2] https://access.redhat.com/solutions/7023939

https://github.com/openshift/operator-framework-olm/pull/665

Bug MGMT-15653: [BE] Domain with double -- (cat--rahul.com) rejected in network validation

View the Description View the linked PRs

Description of the problem:

Base domain contains double `–` like cat–rahul.com allowed by UI and BE and when node discovered , network validation fails.

Current domain is a private case for using – but note that UI and BE allows to send many – chars as part of domain name.

from agent logs:

Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for ntp-synchronizer ntp-synchronizer-70565cf4 args <[{\"ntp_source\":\"\"}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Creating execution step for domain-resolution domain-resolution-f3917dea args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:123" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating domain resolution with args [{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating inventory with args [fea3d7b9-a990-48a6-9a46-4417915072b0]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Failed to validate domain resolution: data, {\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}" file="action.go:42" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating ntp synchronizer with args [{\"ntp_source\":\"\"}]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Validating free addresses with args [[\"192.168.123.0/24\"]]" file="action.go:29"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c cp /etc/mtab /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0 && podman run --privileged --pid=host --net=host --rm --quiet -v /var/log:/var/log -v /run/udev:/run/udev -v /dev/disk:/dev/disk -v /run/systemd/journal/socket:/run/systemd/journal/socket -v /var/log:/host/var/log:ro -v /proc/meminfo:/host/proc/meminfo:ro -v /sys/kernel/mm/hugepages:/host/sys/kernel/mm/hugepages:ro -v /proc/cpuinfo:/host/proc/cpuinfo:ro -v /root/mtab-fea3d7b9-a990-48a6-9a46-4417915072b0:/host/etc/mtab:ro -v /sys/block:/host/sys/block:ro -v /sys/devices:/host/sys/devices:ro -v /sys/bus:/host/sys/bus:ro -v /sys/class:/host/sys/class:ro -v /run/udev:/host/run/udev:ro -v /dev/disk:/host/dev/disk:ro registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 inventory]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=error msg="Unable to create runner for step <domain-resolution-f3917dea>, args <[{\"domains\":[{\"domain_name\":\"api.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"api-int.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"console-openshift-console.apps.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com.\"},{\"domain_name\":\"validateNoWildcardDNS.dummy---dummy.cat--rahul.com\"},{\"domain_name\":\"quay.io\"}]}]>" file="step_processor.go:126" error="validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'" request_id=5467e025-2683-4119-a55a-976bb7787279
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- findmnt --raw --noheadings --output SOURCE,TARGET --target /run/media/iso]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- sh -c podman ps --format '{{.Names}}' | grep -q '^free_addresses_scanner$' || podman run --privileged --net=host --rm --quiet --name free_addresses_scanner -v /var/log:/var/log -v /run/systemd/journal/socket:/run/systemd/journal/socket registry-proxy.engineering.redhat.com/rh-osbs/openshift4-assisted-installer-agent-rhel8:v1.0.0-279 free_addresses '[\"192.168.123.0/24\"]']" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=info msg="Executing nsenter [--target 1 --cgroup --mount --ipc --net -- timeout 30 chronyc -n sources]" file="execute.go:39"
Aug 28 11:28:55 master-0-0 next_step_runne[1918]: time="28-08-2023 11:28:55" level=warning msg="Sending step <domain-resolution-f3917dea> reply output <> error <validation failure list:\nvalidation failure list:\ndomains.0.domain_name in body should match '^([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*[.])+[a-zA-Z]{2,}[.]?$'> exit-code <-1>" file="step_processor.go:76" request_id=5467e025-2683-4119-a55a-976bb7787279

How reproducible:

Create a cluster with domain cat–rahul.com with UI fix that allowing it.

Once node discovered , network validation fails on :

DNS wildcard not configured: DNS wildcard check cannot be performed yet because the host has not yet performed DNS resolution.

Steps to reproduce:

see above

Actual results:

Unable to install cluster due to network validation failure

Expected results:
The domain should be allowed in regex

https://github.com/openshift/assisted-service/pull/5451

Bug OCPBUGS-11524: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/80

Bug OCPBUGS-13099: Update 4.14 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/363

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/363

Bug OCPBUGS-16889: CEO needs to handle optional MachineAPI

View the Description View the linked PRs

Description of problem:

Quoting Joel: In 4.14 there's been an effort to make Machine API optional, anything that that relies on the CRD needs to be able to detect that the CRD is not installed and then not error should that be the case. You should be able to use a discovery client to determine if the API group is installed or not

We have several controllers and informers that are depending on the machine API to be at least available to list and sync caches with. When the API is not installed at all the depending controllers are blocked forever and eventually get killed by the aliveness probe. That causes hot restart loops that cause installations to fail.

https://redhat-internal.slack.com/archives/C027U68LP/p1690436286860899

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. install a machineAPI=false cluster
2. ??? 
3. watch it fail

Bug OCPBUGS-13379: Add a test which checks expected number of reboots for SNO

View the Description View the linked PRs

Some tests may cause unexpected reboots of nodes. On HA setups this is checked by "should report ready nodes the entire duration of the test run" test, which ensures Prometheus metric for node readiness didn't flip.

On SNO however we can't use the metrics, as the prometheus will go down along with the node and the node would become ready again before Prometheus/kube-state-metrics is up again. For SNO we have to check that the node has expected number of reboots - number of "rendered-master/rendered-worker" MC + 1

https://github.com/openshift/origin/pull/27993

Bug OCPBUGS-13764: IPI baremetal install root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
Metal³ now allows these paths in the `name` hint (see ~~OCPBUGS-13080~~), so the IPI installer's implementation using terraform must be changed to match.

https://github.com/openshift/installer/pull/7192

Bug OCPBUGS-14052: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1. Check details of KubeSchedulerDown Alert Rule
2.
3.

Actual results:

The Alert Rule KubeSchedulerDown has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
 - Add runbooks for the relevant Alerts at github.com/openshift/runbooks
 - Add the link to the runbook in the Alert annotation 'runbook_url'
 - Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/cluster-kube-scheduler-operator/pull/489

Bug OCPBUGS-20184: network-node-identity-* pods should be run as non-root

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20104~~. The following is the description of the original issue:
—
Description of problem:

The recently introduced node identify feature introduces pods that are running as root. While it's understood there may be situations where that is absolutely required, the goal should be to always run with least privilege / non-root.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Deploy an IBM Managed OpenShift 4.14.0 cluster. I suspect any OpenShift 4.14.0 cluster will have these pods running as root as well.

Actual results:

network-node-identity pods are running as root

Expected results:

network-node-identity pods should be running as non-root

Additional info:

Due to the introduction of these pods running as root in an IBM Managed OpenShift 4.14.0 cluster, we will have to file for a security exception.

https://github.com/openshift/cluster-network-operator/pull/2054

Bug OCPBUGS-20400: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/582

Bug OCPBUGS-21264: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Task AGENT-692: Remove dependency on assisted-service

View the Description View the linked PRs

Currently, the installer has a dependency on the main assisted-service go module. This means that we pull in all of it's dependencies, which include libnmstate (the Rust one). In practice, this means that we can't update assisted-service at least until AGENT-139 is implemented. And since the main assisted-service module and the API module should be in lockstep, this means we can't update to pick up recent changes to the ZTP API either.

https://github.com/openshift/installer/pull/7439

Story HOSTEDCP-926: Send metric when HO/CPO decide to skip cloud resource deletion

View the Description View the linked PRs

Context:

We currently convey cloud creds issues in ValidOIDCConfiguration and ValidAWSIdentityProvider conditions.

The HO relies on those https://github.com/openshift/hypershift/blob/9e4127055dd7be9cfe4fc8427c39cee27a86efcd/hypershift-operator/controllers/hostedcluster/internal/platform/aws/aws.go#L293

to decide if forcefully deletion should be applied and so potentially intentionally leaving resources behind in cloud. (E.g. use case: oidc creds where broken out of band).

The CPO relies on those to wait for deletion of guest cluster resources https://github.com/openshift/hypershift/blob/8596f7f131169a19c6a67dc6ce078c50467de648/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go#L284-L299

DoD:

When any of the cases above results in the "move kube deletion forward skipping cloud resource deletion" path we should send a metric so consumers / SREs have a sense and can use it to notify customers in conjunction with https://issues.redhat.com/browse/SDA-8613

https://github.com/openshift/hypershift/pull/2531

Bug MGMT-13284: Assisted service requires BMH CRD without a clear error

View the Description View the linked PRs

Description of the problem:

assisted-service pod crashloops with kube-api enabled without the BMH CRD.

How reproducible:

100%

Steps to reproduce:

1. Deploy assisted-service will kube-api enabled

2. Either don't create or remove the BMH CRD (if removed you will need to restart the assisted-service pod)

3. Observe assisted-service pod

Actual results:

After a few minutes assisted-service will crash with a message like:

time="2023-01-12T14:26:03Z" level=fatal msg="failed to run manager" func=main.main.func1 file="/remote-source/assisted-service/app/cmd/main.go:204" error="failed to wait for baremetal-agent-controller caches to sync: timed out waiting for cache to be synced"

Expected results:

Either assisted service comes up without the BMAC controller and without errors or a clear error stating that the BMH CRD is required and is missing.

https://github.com/openshift/assisted-service/pull/5284

Bug MGMT-14723: [Staging] - P/Z cluster can't enable ODF - getting "Failed to update the cluster"

View the Description View the linked PRs

Description of the problem:

In staging, UI 2.20.6, BE 2.20.1 - not able to set ODF on, getting "Failed to update the cluster", although according to the support-level api it should be supported

How reproducible:

100%

Steps to reproduce:

1. Create new OCP 4.13 and P/Z cpu_arc

2. try to enable ODF

3.

Actual results:

Expected results:

Bug OCPBUGS-15327: silence irrelevant "failed to lock file fileutil: file already locked" warnings

View the Description View the linked PRs

Description of problem:

On OpenShift Container Platform, the etcd Pod is showing messages like the following:

2023-06-19T09:10:30.817918145Z {"level":"warn","ts":"2023-06-19T09:10:30.817Z","caller":"fileutil/purge.go:72","msg":"failed to lock file","path":"/var/lib/etcd/member/wal/000000000000bc4b-00000000183620a4.wal","error":"fileutil: file already locked"}


This is described in KCS https://access.redhat.com/solutions/7000327

Version-Release number of selected component (if applicable):

any currently supported version (> 4.10) running with 3.5.x

How reproducible:

always

Steps to Reproduce:

happens after running etcd for a while

This has been discussed in https://github.com/etcd-io/etcd/issues/15360

It's not a harmful error message, it merely indicates that some WALs have not been included in snapshots yet.

This was caused by changing default numbers: https://github.com/etcd-io/etcd/issues/13889

This was fixed in https://github.com/etcd-io/etcd/pull/15408/files but never backported to 3.5.

To mitigate that error and stop confusing people, we should also supply that argument when starting etcd in: https://github.com/openshift/cluster-etcd-operator/blob/master/bindata/etcd/pod.yaml#L170-L187

That way we're not surprised by changes of the default values upstream.

https://github.com/openshift/cluster-etcd-operator/pull/1067

Bug OCPBUGS-19898: Excessive permissions in web-console impersonating a user

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-14322~~. The following is the description of the original issue:
—
Description of problem:

Excessive permissions in web-console impersonating a user

Version-Release number of selected component (if applicable):

4.10.55

How reproducible:

 when trying to impersonate a specific user ('99GU8710') in an OCP 4.10.55 cluster, we are able to see pods and logs in web console and that user is unable to access these things using the command line.

Steps to Reproduce:

1. Create a user with LDAP (example: new_user)
2. Don't give user access to check pod logs for openhshift related namespaces ( For example: new_user should not be able to see pod logs for openhsift-apiserver)
3. Try to impersonate the user (new_user)
4. Try to check openshift-apiserver pod logs through command line( you will be able to see those)
5. Try to check the same logs from command line for new_user , you won't be able to see it.

Actual results:

`Impersonate the user` feature doesn't give correct validation

Expected results:

We should not be able to see pod logs if user does not have permission

Additional info:

https://github.com/openshift/console/pull/13203

Bug OCPBUGS-12132: Update 4.14 ose-cluster-image-registry-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-image-registry-operator/pull/855

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-image-registry-operator/pull/854

Bug OCPBUGS-14296: CI Failure: event happened 49 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator hmsg/593a6eb603 - pathological/true reason/UnstartedEtcdMember unstarted members

View the Description View the linked PRs

From a recent PR run of the recovery suite:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-etcd-operator/1049/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-etcd-recovery/1651162451397316608

> event happened 49 times, something is wrong: ns/openshift-etcd-operator deployment/etcd-operator hmsg/593a6eb603 - pathological/true reason/UnstartedEtcdMember unstarted members: NAME-PENDING-10.0.167.169 From: 10:39:53Z To: 10:39:54Z result=reject

Since the remainder of the test has passed, the event might not be reconciled correctly when a member is coming back in CEO. We should fix this event.

https://github.com/openshift/cluster-etcd-operator/pull/1059

Bug OCPBUGS-10031: Metal virtual media job permafails during early bootstrap

View the Description View the linked PRs

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-metal-ipi-sdn-virtualmedia

Reproduced locally, the failure is:

level=error msg=Attempted to gather debug logs after installation failure: must provide bootstrap host address                                                                               
level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected                
level=info msg=Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected                
level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected                                   
level=info msg=Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected                                   
level=error msg=Cluster operator network Degraded is True with ApplyOperatorConfig: Error while updating operator configuration: could not apply (rbac.authorization.k8s.io/v1, Kind=RoleBindi
ng) openshift-config-managed/openshift-network-public-role-binding: failed to apply / update (rbac.authorization.k8s.io/v1, Kind=RoleBinding) openshift-config-managed/openshift-network-publi
c-role-binding: Patch "https://api-int.ostest.test.metalkube.org:6443/apis/rbac.authorization.k8s.io/v1/namespaces/openshift-config-managed/rolebindings/openshift-network-public-role-binding
?fieldManager=cluster-network-operator%2Foperconfig&force=true": dial tcp 192.168.111.5:6443: connect: connection refused

https://github.com/openshift/cluster-network-operator/pull/1744

Bug MGMT-14781: Make LSO operator to support all CPU architectures

View the Description View the linked PRs

Description of the problem:

Feature support of LSO is currently supporting only x86, this was an error due to https://github.com/openshift/assisted-service/blob/ca339ae3515df6c1394af8b43187e5be13d6306e/internal/operators/lso/ls_operator.go#L103

https://github.com/openshift/assisted-service/pull/5262

Bug OCPBUGS-11671: ccoctl cannot create STS documents in 4.10-4.13 due to s3 policy changes

View the Description View the linked PRs

Description of problem:

Similar to OCPBUGS-11636 ccoctl needs to be updated to account for the s3 bucket changes described in https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/

these changes have rolled out to us-east-2 and China regions as of today and will roll out to additional regions in the near future

See OCPBUGS-11636 for additional information

Version-Release number of selected component (if applicable):

How reproducible:

Reproducible in affected regions.

Steps to Reproduce:

1. Use "ccoctl aws create-all" flow to create STS infrastructure in an affected region like us-east-2. Notice that document upload fails because the s3 bucket is created in a state that does not allow usage of ACLs with the s3 bucket.

Actual results:

./ccoctl aws create-all --name abutchertestue2 --region us-east-2 --credentials-requests-dir ./credrequests --output-dir _output
2023/04/11 13:01:06 Using existing RSA keypair found at _output/serviceaccount-signer.private
2023/04/11 13:01:06 Copying signing key for use by installer
2023/04/11 13:01:07 Bucket abutchertestue2-oidc created
2023/04/11 13:01:07 Failed to create Identity provider: failed to upload discovery document in the S3 bucket abutchertestue2-oidc: AccessControlListNotSupported: The bucket does not allow ACLs
        status code: 400, request id: 2TJKZC6C909WVRK7, host id: zQckCPmozx+1yEhAj+lnJwvDY9rG14FwGXDnzKIs8nQd4fO4xLWJW3p9ejhFpDw3c0FE2Ggy1Yc=

Expected results:

"ccoctl aws create-all" successfully creates IAM and S3 infrastructure. OIDC discovery and JWKS documents are successfully uploaded to the S3 bucket and are publicly accessible.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/526

Bug OCPBUGS-7178: User telemetry is broken (inaccurate) due to the fact that page titles are not unique.

View the Description View the linked PRs

Description of problem:

Pages should have unique page titles, so that we can gather accurate user telemetry data via segment. The page title should differ based on the selected tab.

In order to do proper analysis, branding should not be included in the page title.

Currently the following pages have this title "Red Hat OpenShift Dedicated" (or the respective brand name):
Dev perspective:

BuildConfigs
Pipelines>Pipelines
Pipelines>Repositories
Helm>Helm Releases
Helm>Repositories
Install Helm Chart
Admin perspective:
Pipelines>Pipelines
Pipelines>PipelineRuns
Pipelines>PipelineResources
Pipelines>Repositories
Tasks>Tasks
Tasks>TaskRuns
Tasks>ClusterTasks

The following tabs all have the same page title Observe · Red Hat OpenShift Dedicated:
Dev perspective:

Observe>Dashboard
Observe>Alerts
Observe>Metrics

The following tabs all have the same page title Project Details · Red Hat OpenShift Dedicated:
Dev perspective:

Project>Overview
Project>Details
Project>Project access

All the user preferences tabs have the same page title : User Preferences · Red Hat OpenShift Dedicated

User Preferences>General
User Preferences>Language
User Preferences>Notifications
User Preferences>Applications

The Topology page in the Dev Perspective and the workloads tab of the Project Details/Workloads tab both share the same title: Topology · Red Hat OpenShift Dedicated

The following tabs on the Admin Project page all share the same title. Unsure if we can handle this since it is including the namespace name: sdoyle-dev · Details · Red Hat OpenShift Dedicated. If not, we can drop til 4.14.

Project>Project details>Overview
Project>Project details>Details
Project>Project details>YAML
Project>Project details>RoleBindings

https://github.com/openshift/console/pull/12591

Bug METAL-575: TargetDown: alert firing on 4.14 metal-ipi jobs

View the Description View the linked PRs

4.14 e2e-metal-ipi jobs are failing with

: [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]

e.g. https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn/1643459330390888448

This is the alert that is firing,

promQL query returned unexpected results:
ALERTS{alertname!~"Watchdog|AlertmanagerReceiversNotConfigured|PrometheusRemoteWriteDesiredShards|KubeJobFailed|Watchdog|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|KubePodNotReady|etcdMembersDown|etcdMembersDown|etcdGRPCRequestsSlow|etcdGRPCRequestsSlow|etcdHighNumberOfFailedGRPCRequests|etcdHighNumberOfFailedGRPCRequests|etcdMemberCommunicationSlow|etcdMemberCommunicationSlow|etcdNoLeader|etcdNoLeader|etcdHighFsyncDurations|etcdHighFsyncDurations|etcdHighCommitDurations|etcdHighCommitDurations|etcdInsufficientMembers|etcdInsufficientMembers|etcdHighNumberOfLeaderChanges|etcdHighNumberOfLeaderChanges|KubeAPIErrorBudgetBurn|KubeAPIErrorBudgetBurn|KubeClientErrors|KubeClientErrors|KubePersistentVolumeErrors|KubePersistentVolumeErrors|MCDDrainError|MCDDrainError|MCDPivotError|MCDPivotError|PrometheusOperatorWatchErrors|PrometheusOperatorWatchErrors|RedhatOperatorsCatalogError|RedhatOperatorsCatalogError|VSphereOpenshiftNodeHealthFail|VSphereOpenshiftNodeHealthFail|SamplesImagestreamImportFailing|SamplesImagestreamImportFailing",alertstate="firing",severity!="info"} >= 1
[
{
"metric":

{ "__name__": "ALERTS", "alertname": "TargetDown", "alertstate": "firing", "job": "catalog-operator-metrics", "namespace": "openshift-operator-lifecycle-manager", "prometheus": "openshift-monitoring/k8s", "service": "catalog-operator-metrics", "severity": "warning" }

,
"value": [
1680670057.374,
"1"
]
},

https://github.com/openshift/operator-framework-olm/pull/478

Bug OCPBUGS-3356: HAproxy warning when httpCaptureCookies.maxLength exceeds 63 bytes

View the Description View the linked PRs

Description of problem:
IHAC with OCP 4.9 who has configured the IngressControllers with a long httpLogFormat, and the routers are printing every time it reloads

I0927 13:29:45.495077 1 router.go:612] template "msg"="router reloaded" "output"="[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'public'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_sni'.\n[WARNING] 269/132945 (9167) : config : truncating capture length to 63 bytes for frontend 'fe_no_sni'.\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

This is the Ingress Contoller configuration:

  logging:
    access:
      destination:
        syslog:
          address: 10.X.X.X
          port: 10514
        type: Syslog
      httpCaptureCookies:
      - matchType: Exact
        maxLength: 128
        name: ITXSESSIONID
      httpCaptureHeaders:
        request:
        - maxLength: 128
          name: Host
        - maxLength: 128
          name: itxrequestid
      httpLogFormat: actconn="%ac",backend_name="%b",backend_queue="%bq",backend_source_ip="%bi",backend_source_port="%bp",beconn="%bc",bytes_read="%B",bytes_uploaded="%U",captrd_req_cookie="%CC",captrd_req_headers="%hr",captrd_res_cookie="%CS",captrd_res_headers="%hs",client_ip="%ci",client_port="%cp",cluster="ieec1ocp1",datacenter="ieec1",environment="pro",fe_name_transport="%ft",feconn="%fc",frontend_name="%f",hostname="%H",http_version="%HV",log_type="http",method="%HM",query_string="%HQ",req_date="%tr",request="%HP",res_time="%TR",retries="%rc",server_ip="%si",server_name="%s",server_port="%sp",srv_queue="%sq",srv_conn="%sc",srv_queue="%sq",status_code="%ST",Ta="%Ta",Tc="%Tc",tenant="bk",term_state="%tsc",tot_wait_q="%Tw",Tr="%Tr"
      logEmptyRequests: Ignore

Any way to avoid this truncate warning?

How reproducible:
For every reload of haproxy config

Steps to Reproduce:
You can reproduce easily with the following configuration in the default ingress controller:

logging:
access:
destination:
type: Container
httpCaptureCookies:

matchType: Exact
maxLength: 128
name: _abck
And accessing from out console, you will get a log like:

2022-10-18T14:13:53.068164+00:00 xxxx xxxxxx haproxy[38]: 10.39.192.203:40698 [18/Oct/2022:14:13:52.488] fe_sni~ be_secure:openshift-console:console/pod:console-5976495467-zxgxr:console:https:10.128.1.116:8443 0/0/0/10/580 200 1130598 _abck=B7EA642C9E828FA8210F329F80B7B2D80YAAQnVozuFVfkOaDAQAADk - --VN 78/37/33/33/0 0/0 "GET /api/kubernetes/openapi/v2 HTTP/1.1"

https://github.com/openshift/router/pull/436

Bug OCPBUGS-7015: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6967

Bug OCPBUGS-10080: Update 4.14 ose-cluster-dns-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-dns-operator/pull/357

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-dns-operator/pull/357

Bug OCPBUGS-20441: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7583

Bug OCPBUGS-8330: csi-snapshot-controller ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

csi-snapshot-controller ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/142

Bug OCPBUGS-11434: Cluster monitoring operator runs node-exporter with btrfs collector

View the Description View the linked PRs

Description of problem:

node-exporter profiling shows that ~16% of CPU time is spend fetching details about btrfs mounts. RHEL kernel doesn't have btrfs, so its safe to disable this exporter

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1937

Bug OCPBUGS-15769: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17356: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-storage-operator/pull/391

Bug OCPBUGS-10224: Multiple instances of tabs under ODF dashboard

View the Description View the linked PRs

Description of problem:

Multiple instances of tabs under ODF dashboard is seen and sometimes it also shows 404 error when each such tab is selected and the page is re-loaded

https://bugzilla.redhat.com/show_bug.cgi?id=2124829

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12635

Bug OCPBUGS-17985: Ignition server rendering fails when image mirrors do not include openshift release mirrors

View the Description View the linked PRs

Description of problem:

When creating a hosted cluster on a management cluster that has an imagecontentsourcepolicy that does not include openshift-release-dev or ocp/release images, the control plane operator fails reconciliation with an error:

{"level":"error","ts":"2023-08-22T18:26:07Z","msg":"Reconciler error","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","HostedControlPlane":{"name":"jiezhao-test","namespace":"clusters-jiezhao-test"},"namespace":"clusters-jiezhao-test","name":"jiezhao-test","reconcileID":"9b3c101b-b4d2-4d9e-b71c-ede9e0b55374","error":"failed to update control plane: failed to reconcile ignition server: failed to parse private registry hosted control plane image reference \"\": repository name must have at least one component","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:326\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Create an ImageContentSourcePolicy on a management cluster:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: brew-registry
  resourceVersion: "31794"
  uid: 7231c634-da35-4c56-b2ef-be48c2571a9c
spec:
  repositoryDigestMirrors:
  - mirrors:
    - brew.registry.redhat.io
    source: registry.redhat.io
  - mirrors:
    - brew.registry.redhat.io
    source: registry.stage.redhat.io
  - mirrors:
    - brew.registry.redhat.io
    source: registry-proxy.engineering.redhat.com


2. Install the latest hypershift operator and create a hosted cluster with the latest 4.14 ci build

Actual results:

The hostedcluster never creates machines and never gets to a Complete state

Expected results:

The hostedcluster comes up and gets to a Complete state

Additional info:

https://github.com/openshift/hypershift/pull/2935

Bug OCPBUGS-11187: EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.

View the Description View the linked PRs

Description of problem:

EgressIP was NOT migrated to correct workers after deleting machine it was assigned in GCP XPN cluster.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-29-235439

How reproducible:

Always

Steps to Reproduce:

1. Set up GCP XPN cluster.
2. Scale two new worker nodes
% oc scale --replicas=2 machineset huirwang-0331a-m4mws-worker-c -n openshift-machine-api        
machineset.machine.openshift.io/huirwang-0331a-m4mws-worker-c scaled

3. Wait the two new workers node ready.
 % oc get machineset -n openshift-machine-api
NAME                            DESIRED   CURRENT   READY   AVAILABLE   AGE
huirwang-0331a-m4mws-worker-a   1         1         1       1           86m
huirwang-0331a-m4mws-worker-b   1         1         1       1           86m
huirwang-0331a-m4mws-worker-c   2         2         2       2           86m
huirwang-0331a-m4mws-worker-f   0         0                             86m
% oc get nodes
NAME                                                          STATUS   ROLES                  AGE     VERSION
huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   82m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 71m     v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 8m22s   v1.26.2+dc93b13
3. Label one new worker node as egress node
 % oc label node huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
node/huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal labeled

4. Create egressIP object
oc get egressIP
NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100
5. Label second new worker node as egress node 
% oc label node huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal k8s.ovn.org/egress-assignable="" 
node/huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal labeled
6. Delete the assigned egress node
% oc delete machines.machine.openshift.io huirwang-0331a-m4mws-worker-c-rhbkr  -n openshift-machine-api
machine.machine.openshift.io "huirwang-0331a-m4mws-worker-c-rhbkr" deleted
 % oc get nodes
NAME                                                          STATUS   ROLES                  AGE   VERSION
huirwang-0331a-m4mws-master-0.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
huirwang-0331a-m4mws-master-1.c.openshift-qe.internal         Ready    control-plane,master   86m   v1.26.2+dc93b13
huirwang-0331a-m4mws-master-2.c.openshift-qe.internal         Ready    control-plane,master   87m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-a-hfqsn.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-b-vbqf2.c.openshift-qe.internal   Ready    worker                 76m   v1.26.2+dc93b13
huirwang-0331a-m4mws-worker-c-wnm4r.c.openshift-qe.internal   Ready    worker                 13m   v1.26.2+dc93b13
29468 W0331 02:48:34.917391       1 egressip_healthcheck.go:162] Could not connect to huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal (10.129.4.2:9107): context       deadline exceeded
29469 W0331 02:48:34.917417       1 default_network_controller.go:903] Node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal is not ready, deleting it from egre      ss assignment
29470 I0331 02:48:34.917590       1 client.go:783]  "msg"="transacting operations" "database"="OVN_Northbound" "operations"="[{Op:update Table:Logical_Switch_Port Row:map[o      ptions:{GoMap:map[router-port:rtoe-GR_huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[where column       _uuid == {6efd3c58-9458-44a2-a43b-e70e669efa72}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]"
29471 E0331 02:48:34.920766       1 egressip.go:993] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not reachable, will attempt rebalancing
29472 E0331 02:48:34.920789       1 egressip.go:997] Allocator error: EgressIP: egressip-1 assigned to node: huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal whi      ch is not ready, will attempt rebalancing
29473 I0331 02:48:34.920808       1 egressip.go:1212] Deleting pod egress IP status: {huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal 10.0.32.100} for EgressIP:       egressip-1

Actual results:

The egressIP was not migrated to correct worker
 oc get egressIP      
NAME         EGRESSIPS     ASSIGNED NODE                                                 ASSIGNED EGRESSIPS
egressip-1   10.0.32.100   huirwang-0331a-m4mws-worker-c-rhbkr.c.openshift-qe.internal   10.0.32.100

Expected results:

The egressIP should migrated to correct worker from deleted node.

Additional info:

https://github.com/openshift/cloud-network-config-controller/pull/103

Bug OCPBUGS-14578: HostPrefix/pod cidr mask is not setup correctly in the nodes

View the Description View the linked PRs

Description of problem:

In ROSA, user can be specified an HostPrefix, but we are currently not passing it to the HostedCluster CR. Trying to fix it, it seems that we are not setting up it correctly in the Nodes.

Version-Release number of selected component (if applicable):

4.12.16

How reproducible:

Always

Steps to Reproduce:

1. Create an HC. Inside the spec add 
  networking:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 25
2. Deploy the HC. Check its configuration.

Actual results:

oc get network cluster is showing the right config (see attachment) 
An oc describe node is always showing a /24 hostPrefix.

Note that this is valid also with the default value of /23. In the node, under podCIDR I always see something like
PodCIDR:                                   10.128.1.0/24 
PodCIDRs:                                  10.128.1.0/24

Expected results:

I would expect the pod cidr mask to be reflected in the pod configuration

Additional info:

pod cidr is correctly set

https://github.com/openshift/hypershift/pull/2731

Bug OCPBUGS-15794: Missing workload annotation for daemonset cni-sysctl-allowlist-ds

View the Description View the linked PRs

Description of problem:

Daemonset cni-sysctl-allowlist-ds is missing annotation for workload partitioning.

Version-Release number of selected component (if applicable):

How reproducible:

Executing the daemonset shows the pod missing the workload annotation

Steps to Reproduce:

1. Run Daemonset
2.
3.

Actual results:

No workload annotation present.

Expected results:

annotations:
        target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1866

Bug OCPBUGS-20315: [4.14] BMO sets preprovisioning architecture to x86 for arm BMHs

View the Description View the linked PRs

Version:
4.13.0-rc.3
advanced-cluster-management.v2.8.0-123
multicluster-engine.v2.3.0-125

Attempted to deploy SNO spoke on an ARM machine.
The preprovisioningimage is not ready - InfraEnvArchMismatch
The BMH is stuck in provisioning:

 
oc get bmh -A NAMESPACE NAME STATE CONSUMER ONLINE ERROR AGE nvd-srv-15 nvd-srv-15 provisioning true 32m

oc get preprovisioningimages.metal3.io -A NAMESPACE NAME READY REASON nvd-srv-15 nvd-srv-15 False InfraEnvArchMismatch

 
oc get bmh nvd-srv-15 -o yaml apiVersion: metal3.io/v1alpha1 kind: BareMetalHost metadata: annotations: bmac.agent-install.openshift.io/hostname: nvd-srv-15 bmac.agent-install.openshift.io/role: master inspect.metal3.io: disabled creationTimestamp: "2023-05-02T19:08:05Z" finalizers: - baremetalhost.metal3.io generation: 2 labels: infraenvs.agent-install.openshift.io: nvd-srv-15 name: nvd-srv-15 namespace: nvd-srv-15 resourceVersion: "16766175" uid: 0ed0a685-e171-46a9-a4de-9367a53a4060 spec: automatedCleaningMode: disabled bmc: address: redfish-virtualmedia://10.8.232.14/redfish/v1/Systems/Self credentialsName: bmc-secret1 disableCertificateVerification: true bootMACAddress: 74:56:3c:40:0b:c4 customDeploy: method: start_assisted_install online: true rootDeviceHints: deviceName: /dev/nvme0n1 status: errorCount: 0 errorMessage: "" goodCredentials: credentials: name: bmc-secret1 namespace: nvd-srv-15 credentialsVersion: "16766003" hardwareProfile: unknown lastUpdated: "2023-05-02T19:08:16Z" operationHistory: deprovision: end: null start: null inspect: end: null start: null provision: end: null start: "2023-05-02T19:08:16Z" register: end: "2023-05-02T19:08:16Z" start: "2023-05-02T19:08:05Z" operationalStatus: OK poweredOn: false provisioning: ID: 797cde66-823c-4cac-85eb-d81293e11eac bootMode: UEFI image: url: "" rootDeviceHints: deviceName: /dev/nvme0n1 state: provisioning triedCredentials: credentials: name: bmc-secret1 namespace: nvd-srv-15 credentialsVersion: "16766003"

 oc get preprovisioningimages.metal3.io nvd-srv-15 -o yaml apiVersion: metal3.io/v1alpha1 kind: PreprovisioningImage metadata: creationTimestamp: "2023-05-02T19:08:16Z" generation: 13 labels: infraenvs.agent-install.openshift.io: nvd-srv-15 name: nvd-srv-15 namespace: nvd-srv-15 ownerReferences: - apiVersion: metal3.io/v1alpha1 blockOwnerDeletion: true controller: true kind: BareMetalHost name: nvd-srv-15 uid: 0ed0a685-e171-46a9-a4de-9367a53a4060 resourceVersion: "16782790" uid: 6edafb9a-3d22-41ca-b920-01011b679285 spec: acceptFormats: - iso - initrd architecture: x86_64 status: conditions: - lastTransitionTime: "2023-05-02T19:08:16Z" message: PreprovisioningImage CPU architecture (x86_64) does not match InfraEnv CPU architecture (arm64) observedGeneration: 13 reason: InfraEnvArchMismatch status: "False" type: Ready - lastTransitionTime: "2023-05-02T19:08:16Z" message: PreprovisioningImage CPU architecture (x86_64) does not match InfraEnv CPU architecture (arm64) observedGeneration: 13 reason: InfraEnvArchMismatch status: "True" type: Error networkData: {}

oc get infraenv nvd-srv-15 -o yaml apiVersion: agent-install.openshift.io/v1beta1 kind: InfraEnv metadata: annotations: infraenv.agent-install.openshift.io/enable-ironic-agent: "true" creationTimestamp: "2023-05-02T18:38:25Z" finalizers: - infraenv.agent-install.openshift.io/ai-deprovision generation: 2 name: nvd-srv-15 namespace: nvd-srv-15 resourceVersion: "16781762" uid: cf895c62-4336-475c-847e-f29ea9717b2c spec: agentLabels: bla: aaa clusterRef: name: nvd-srv-15 namespace: nvd-srv-15 cpuArchitecture: arm64 ipxeScriptType: DiscoveryImageAlways nmStateConfigLabelSelector: {} pullSecretRef: name: pull-secret sshAuthorizedKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC2HqaPvD2gpaVNOuCfrv2RMweuV/u+N/bd2BiJZtarie6Hn/YwNR9MlMZIdmO+gvlsE1nRwx8drQ13OqqcPAV3FoDwc5vG6kQegKhvJ7xGT8iW0VM7TK9kQQitmsv4BVC14m8PCBG3gCUpLwhbbfupbq4HSNPY53pPKwhfmyU0YgblOyIDBz01kUKEC30yKSsLknlnlMV5DxbHJj74Zc3e09rEV3yEBEcz3VHlBLciddBfJ/L+1fRgJsosANOZ7mGvZm98a4AJ8RD/Lg4BH2bcWYYZKI46cH/FRPNVSejCVVwHU/wRlNLPjxDvrv+kO74mrI272C8RwwS1Iotf1C5uMgdfVYHj/aVOQUsgLPZ3car7NYBMs9cGx2tGwZH9C8TQFkHZu8hyUDuAbqpBbYbCHjT4PGImaLX44BA2w69ZlWg4gIQslgrhWlmxkO5X8bhStGhxWvuXVWXKU56Sy9DEfcZPvWBftwLAlTl5pwdldKc970nBoTXdqMeEb9LX4OU= kni@r640-u01.qe1.kni.lab.eng.bos.redhat.com status: agentLabelSelector: matchLabels: infraenvs.agent-install.openshift.io: nvd-srv-15 bootArtifacts: initrd: https://assisted-image-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/images/9101ee53-a224-47e3-bea8-215d1246b727/pxe-initrd?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI5MTAxZWU1My1hMjI0LTQ3ZTMtYmVhOC0yMTVkMTI0NmI3MjcifQ.NilxXsInfATeOgKqynrcHbeZBPVb2THamekMFc9qiwuq9ZSiaRQCW3aHB793ARv3vGzdkiP1DMWQRczXhzchHg&arch=arm64&version=4.13-arm ipxeScript: https://assisted-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/api/assisted-install/v2/infra-envs/9101ee53-a224-47e3-bea8-215d1246b727/downloads/files?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI5MTAxZWU1My1hMjI0LTQ3ZTMtYmVhOC0yMTVkMTI0NmI3MjcifQ.sVjoWwISac6jEnFcKGm6re6my6eYZ-3b51YA9pB89D7v1tJmHX6bWljKCxthW10OKOCWymaFgVlnG7ga_QnPJA&file_name=ipxe-script kernel: https://assisted-image-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/boot-artifacts/kernel?arch=arm64&version=4.13-arm rootfs: https://assisted-image-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/boot-artifacts/rootfs?arch=arm64&version=4.13-arm conditions: - lastTransitionTime: "2023-05-02T18:38:25Z" message: Image has been created reason: ImageCreated status: "True" type: ImageCreated createdTime: "2023-05-02T19:25:59Z" debugInfo: eventsURL: https://assisted-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/api/assisted-install/v2/events?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI5MTAxZWU1My1hMjI0LTQ3ZTMtYmVhOC0yMTVkMTI0NmI3MjcifQ.uBPr7qi6NjnCm-kEtf-uB0KoRZ_6JqMhOUP0BU1cpe9tLa-O6eQCMQT34Jg4J1kBmagHnfYftH50H45L-5zkWA&infra_env_id=9101ee53-a224-47e3-bea8-215d1246b727 isoDownloadURL: https://assisted-image-service-multicluster-engine.apps.qe4.kni.lab.eng.bos.redhat.com/images/9101ee53-a224-47e3-bea8-215d1246b727?api_key=eyJhbGciOiJFUzI1NiIsInR5cCI6IkpXVCJ9.eyJpbmZyYV9lbnZfaWQiOiI5MTAxZWU1My1hMjI0LTQ3ZTMtYmVhOC0yMTVkMTI0NmI3MjcifQ.bWtmfbro67Q6dnoTHdiDy7lmeqNMZ4rUXABfdbhjWqveRzysk1Egk0ba9CXrl3rFRwyLPTuEawtQ_GikpivBOw&arch=arm64&type=minimal-iso&version=4.13-arm

https://github.com/openshift/baremetal-operator/pull/304

Bug OCPBUGS-11256: Topology UI doesn't recognize Serverless Rust function for proper UI icon

View the Description View the linked PRs

Description of problem:

Topology UI doesn't recognize Serverless Rust function for proper UI icon

Version-Release number of selected component (if applicable):

4.12.0

How reproducible:

Always

Steps to Reproduce:

1. Deploy 3 KNative/Serverless functions: Quarkus, Spring Boot, Rust
2. Observe in Topology UI that only for Quarku and Spring Boot specific icons are used, while for Rust case - regular icon for OpenShift
3. Check each of presented UI snippets/rectangles and find such related labels:
For Quarkus: 
app.openshift.io/runtime=quarkus
function.knative.dev/runtime=rust

For Spring Boot:
app.openshift.io/runtime=spring-boot
function.knative.dev/runtime=springboot

For Rust:
function.knative.dev/runtime=rust (no presence of app.openshift.io/runtime=rust for it)

Actual results:

No specific UI icon for Rust function

Expected results:

Specific UI icon for Rust function

Additional info:

https://github.com/openshift/console/pull/12816

Bug OCPBUGS-16459: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-proxy/pull/260

Bug OCPBUGS-11147: network_logs: Gather multus resource yamls for namespaces

View the Description View the linked PRs

Extend multus resource collection so that we gather all resources on a per namespace basis with oc adm inspect.
This way, users can create a combined must-gather with all resources in one place.

We might have to revisit this once the reconciler and other changes land in more recent version of multus, but for the time being I think that this is a good change to make that we can also bp to older versions

https://github.com/openshift/must-gather/pull/354

Bug OCPBUGS-15349: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/91

Bug OCPBUGS-16113: feature-gate-manifest command not recognized in older CPO versions

View the Description View the linked PRs

Description of problem:

Noticed an issue with the ignition server when testing some of the latest HO updates on our older control planes:

❯ oc logs ignition-server-5fd4c89764-bddss -n master-roks-dev-4-9
Defaulted container "ignition-server" out of: ignition-server, fetch-feature-gate (init)
Error: unknown flag: --feature-gate-manifest

This seems to be thrown because that flag doesn't exist within the ignition server source code for previous control plane versions--we're specifically only seeing this in 4.9 and 4.10, where the ignition server was not being managed by CPO.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Install HO off main
2. Bring up 4.9/4.10 hosted control planes
3. Ignition server crashes

Actual results:

Ignition server crashes

Expected results:

Ignition server to run without issues

Additional info:

https://github.com/openshift/hypershift/pull/2817

Bug OCPBUGS-16298: kube-apiserver without need-management-kas-access label could still access mgmt KAS

View the Description View the linked PRs

Description of problem:

For HOSTEDCP-1062 , components without a label `hypershift.openshift.io/need-management-kas-access: "true"` can not access the management cluster KAS resources. 
But for `kube-apiserver` in HCP, there isn't the targe label `hypershift.openshift.io/need-management-kas-access: "true"` but it can access the mgmt KAS


jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep kube-apiserver
kube-apiserver-6799b6cfd8-wk8pv                      3/3     Running   0          178m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods kube-apiserver-6799b6cfd8-wk8pv -n clusters-jie-test -o yaml | grep hypershift.openshift.io/need-management-kas-access
jiezhao-mac:hypershift jiezhao$ 

jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/kube-apiserver-6799b6cfd8-wk8pv curl --connect-timeout 2 -Iks https://10.0.142.255:6443 -v
Defaulted container "apply-bootstrap" out of: apply-bootstrap, kube-apiserver, audit-logs, init-bootstrap (init), wait-for-etcd (init)
* Rebuilt URL to: https://10.0.142.255:6443/
..
< HTTP/2 403 
HTTP/2 403 
...
< 
* Connection #0 to host 10.0.142.255 left intact

How reproducible:

refer test case: https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-65141

Steps to Reproduce:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-65141

Additional info:

router pod has the label and can access mgmt KAS. My expectation is that router pod shouldn't have the label and shouldn't access mgmt KAS.

$ oc get pods router-667cb7f844-lx8mv -n clusters-jie-test -o yaml | grep hypershift.openshift.io/need-management-kas-access
hypershift.openshift.io/need-management-kas-access: "true"
jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/router-667cb7f844-lx8mv curl --connect-timeout 2 -Iks 
https://10.0.142.255:6443
-v
Rebuilt URL to: 
https://10.0.142.255:6443/
Trying 10.0.142.255...
...
< HTTP/2 403
HTTP/2 403

> Actually, router doesn't need it anymore after https://github.com/openshift/hypershift/pull/2778

https://github.com/openshift/hypershift/pull/2888

Bug OCPBUGS-24197: [release-4.14] Add client version in must-gather summary

View the Description View the linked PRs

Backport of PR 1603 [1] to 4.14 for adding client version in must-gather summary.

[1] https://github.com/openshift/oc/pull/1603

https://github.com/openshift/oc/pull/1607

Story SDN-4057: Allow separate images to the specified for Hosted Control Plane components

View the Description View the linked PRs

Hypershift needs to be able to specify a different release payload for control plane components without redeploying anything in the hosted cluster.

ovnkube-node DaemonSet pods in the hosted cluster and the ovnkube-master pods that run in the control plane both use the same ovn-kubernetes image passed to the CNO.

https://github.com/openshift/hypershift/blob/fc42313fc93125799f7eba5361190043cc2f6561/control-plane-operator/controllers/hostedcontrolplane/cno/clusternetworkoperator.go#L90

We need a way to specify these images separately for ovnkube-node and ovnkube-master.

Background:
https://docs.google.com/document/d/1a3tAS_K6lQ2iicjvuIvPIK5lervXFEVQBCAXopBAJ6o/edit

Bug OCPBUGS-1370: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/525

Feature Request RFE-3765: Allow Ingress to Modify the HAProxy Log Length when using a Sidecar

View the Description View the linked PRs

1. Proposed title of this feature request

Allow Ingress to be modified the log length when using a sidecar

2. What is the nature and description of the request?

In the past we had the ~~RFE-1794~~ where an option was created to specify the length of the HAProxy log, however this option was only available for when redirecting the log for an external syslog. We need this option to be available for when using a sidecar to collect the logs.

apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: default
  namespace: openshift-ingress-operator
spec:
  replicas: 2
  logging:
    access:
      destination:
        type: Container
        container: {}

Differently from the Syslog type, the Container type does not have any sub-parameter, which makes possible to configurate the log length.

As we can see in the ~~RFE-1794~~, the option to change the log length already exists in the haproxy configuration, but when using the sidecar, only the default value(1024) is used.

3. Why does the customer need this? (List the business requirements here)

The default log length of HAProxy is 1024. When the clients communicate to the application with the long uri arguments, it cannot catch the full access log and the parameter info. It is required a option to setup 8192 or higher.

4. List any affected packages or components.

haproxy
ingress
ingress-operator

https://github.com/openshift/cluster-ingress-operator/pull/900

Bug OCPBUGS-18794: Panic detected in pod on 4.14 PowerVS CI runs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18103~~. The following is the description of the original issue:
—
Description:

Now that the huge e2e test case failures in CI jobs is resolved in the recent jobs observed a Undiagnosed panic detected in pod issue.

JobLink

Error:

{ pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686400 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) pods/openshift-image-registry_cluster-image-registry-operator-7f7bd7c9b4-k8fmh_cluster-image-registry-operator_previous.log.gz:E0825 02:44:06.686630 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Some Observations:
1)While starting ImageConfigController it Failed to watch *v1.Route: as the server could not find the requested resource",

2)which eventually lead sync problem "E0825 01:26:52.428694 1 clusteroperator.go:104] unable to sync ClusterOperatorStatusController: config.imageregistry.operator.openshift.io "cluster" not found, requeuing"

3)and then while creating deployment resource for "cluster-image-registry-operator" it caused a panic error: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference):"

https://github.com/openshift/cluster-image-registry-operator/pull/917

Bug OCPBUGS-7906: hostpath and node-driver-registrar containers are not pinned to mgmt cores - no WLP annotation

View the Description View the linked PRs

Description of problem:

node-driver-registrar and hostpath containers in pod shared-resource-csi-driver-node-xxxxx under openshift-cluster-csi-drivers namespace are not pinned to reserved management cores.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Deploy SNO via ZTP with workload partitioning enabled
2. Check mgmt pods affinity
3.

Actual results:

pods do not have workload partitioning annotation, and are not pinned to mgmt cores

Expected results:

All management pods should be pinned to reserved cores

Pod should be annotated with: target.workload.openshift.io/management: '{"effect":"PreferredDuringScheduling"}'

Additional info:

pod metadata

metadata:
  annotations:
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["fd01:0:0:1::5f/64"],"mac_address":"0a:58:97:51:ad:31","gateway_ips":["fd01:0:0:1::1"],"ip_address":"fd01:0:0:1::5f/64","gateway_ip":"fd01:0:0:1::1"}}'
    k8s.v1.cni.cncf.io/network-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "fd01:0:0:1::5f"
          ],
          "mac": "0a:58:97:51:ad:31",
          "default": true,
          "dns": {}
      }]
    k8s.v1.cni.cncf.io/networks-status: |-
      [{
          "name": "ovn-kubernetes",
          "interface": "eth0",
          "ips": [
              "fd01:0:0:1::5f"
          ],
          "mac": "0a:58:97:51:ad:31",
          "default": true,
          "dns": {}
      }]
    openshift.io/scc: privileged
/var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/tests/workload_partitioning.go:113


SNO management workload partitioning [It] should have management pods pinned to reserved cpus
/var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/tests/workload_partitioning.go:113

  [FAILED] Expected
      <[]ranwphelper.ContainerInfo | len:3, cap:4>: [
          {
              Name: "hostpath",
              Cpus: "2-55,58-111",
              Namespace: "openshift-cluster-csi-drivers",
              PodName: "shared-resource-csi-driver-node-vzvtc",
              Shares: 10,
              Pid: 41650,
          },
          {
              Name: "cluster-proxy-service-proxy",
              Cpus: "2-55,58-111",
              Namespace: "open-cluster-management-agent-addon",
              PodName: "cluster-proxy-service-proxy-66599b78bf-k2dvr",
              Shares: 2,
              Pid: 35093,
          },
          {
              Name: "node-driver-registrar",
              Cpus: "2-55,58-111",
              Namespace: "openshift-cluster-csi-drivers",
              PodName: "shared-resource-csi-driver-node-vzvtc",
              Shares: 10,
              Pid: 34782,
          },
      ]
  to be empty
  In [It] at: /var/lib/jenkins/workspace/ocp-far-edge-vran-deployment/cnf-gotests/test/ran/workloadpartitioning/ranwphelper/ranwphelper.go:172 @ 02/22/23 01:05:00.268

cluster-proxy-service-proxy is reported in https://issues.redhat.com/browse/OCPBUGS-7652

https://github.com/openshift/csi-driver-shared-resource-operator/pull/72

Bug OCPBUGS-16844: external link icon in `resource added` toast notification not linked

View the Description View the linked PRs

Description of problem:

External link icon in `resource added` toast notification not linked and cannot be clicked to open the app URL.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. use the +Add page and import from git
2. after creating the app a toast notification will appear
3. Click the external link icon

Actual results:

External link icon is not part of the link but has a pointer cursor and a hover effect. Clicking this icon does nothing.

Expected results:

External link icon should be part of the link and clickable.

Additional info:

https://github.com/openshift/console/pull/13057

Bug OCPBUGS-21573: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/135

Bug OCPBUGS-26551: Set the correct kubelet wrapper selinux permissions within MCO

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25948~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25362. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/4107

Task OPRUN-2941: Update cluster-policy-controller dependency

View the Description View the linked PRs

For reasons I still struggle to understand, in trying to mitigate issues stemming from the PSA changes to k8s, we decided on a convoluted architecture where one reconciler by one team (cluster-policy-controller) ignores openshift-* namespaces unless they have a specific label and are not part of the payload, while a reconciler on our team labels non-payload openshift-* namespaces appropriately so that the first one will do its security magic and keep workloads stable during this transition. This cockamamie scheme lead to a dependency between olm and cpc s.t. we can share the list of payload openshift-* namespaces.

This also means that we need to update the dependency at each release to keep parity with the OCP version of the dependency and olm.

We need to update the cpc dependency as the pipeline is blocked until we do (to letting an old version of the dependency, perhaps with a different list of payload openshift-* namespaces and breaking customer cluster or impacting their experience).

Note: this is currently blocking ART compliance PRs. We need to get this in ASAP.

https://github.com/openshift/operator-framework-olm/pull/494

Bug OCPBUGS-14064: BMO is sharing the same pod as Ironic

View the Description View the linked PRs

This is actually a better design since BMO does not need to be coupled with Ironic (unlike Ironic and httpd, for example). But the current architecture also has two real issues:

BMO needs to know the IP address of Ironic, which causes a chicken-and-egg problem: the IP is not known until the pod starts.
Since BMO is a part of the Metal3 pod, it also uses host networking and other privileges. For example, the webhook port is exposed externally.

The main thing to fix is to make BMO talk to Ironic via its external IP instead of localhost.

https://github.com/openshift/cluster-baremetal-operator/pull/342

Bug OCPBUGS-14424: OVN Kubernetes multi-homing in CNV: Flat overlay

View the Description View the linked PRs

Description of problem:

This is a clone for https://issues.redhat.com/browse/CNV-26608

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12869

Bug OCPBUGS-16656: Devfile import fails on master branch

View the Description View the linked PRs

Description of problem:

On the openshift/console master branch, a devfile import fails by default. I have noticed that when a repository url has a .git extension, the pod fails due to a bug where the container image is trying to pull from dockerhub rather than the openshift image registry. For example, the container image is Image:          devfile-sample-code-with-quarkus.git:latest but the image from the imagestreamtag is image-registry.openshift-image-registry.svc:5000/maysun/devfile-sample-code-with-quarkus.git@sha256:e6aa9d29be48b33024eb271665d11a7557c9f140c9bd58aeb19fe4570fffb421.

A pod describe shows the expected error "Failed to pull image "devfile-sample-code-with-quarkus.git:latest": rpc error: code = Unknown desc = reading manifest latest in docker.io/library/devfile-sample-code-with-quarkus.git: requested access to the resource is denied".

However, during import, if you were to remove the .git extention from the repository link, the import is successful.

I only see this on the master branch and it seems to be fine on my local crc which is on OpenShift version: 4.13.0

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Build from openshift/console master
2. Import Devfile sample
3. If repo has a .git extension, pod fails with the wrong image

Actual results:

POD describe:

Failed to pull image "devfile-sample-code-with-quarkus.git:latest": rpc error: code = Unknown desc = reading manifest latest in docker.io/library/devfile-sample-code-with-quarkus.git: requested access to the resource is denied

Expected results:

Successful running pod

Additional info:

Fine on Openshift 4.13.0, tested on local crc:

$ crc version
WARN A new version (2.23.0) has been published on https://developers.redhat.com/content-gateway/file/pub/openshift-v4/clients/crc/2.23.0/crc-macos-installer.pkg 
CRC version: 2.20.0+f3a947
OpenShift version: 4.13.0
Podman version: 4.4.4

https://github.com/openshift/console/pull/13050

Bug OCPBUGS-20080: [4.14] Keepalived on bootstrap doesn't start due to missing confiugration

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19492~~. The following is the description of the original issue:
—
Description of problem:

Keepalived constantly fails on bootstrap causing installation failure

Seems like it doesn't have keepalived.conf file and keepalived monitor fails on
Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Regular installation through assisted installer

Steps to Reproduce:

1.
2.
3.

Actual results:

keepalived fails to start

Expected results:

Success

Additional info:
*

https://github.com/openshift/baremetal-runtimecfg/pull/277

Bug OCPBUGS-12613: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/39

Bug OCPBUGS-15021: Route Metrics page is returning empty page for normal user

View the Description View the linked PRs

Description of problem:y

An empty page returned when normal user try to view Route Metrics page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Check any Routes metrics page with cluster-admin user, for example /k8s/ns/openshift-monitoring/routes/alertmanager-main/metrics, we can see the route metrics page and charts are loaded successfully
2. Grant normal user admin permission on 'openshift-monitoring' project
$ oc adm policy add-role-to-user admin testuser-1 -n openshift-monitoring
clusterrole.rbac.authorization.k8s.io/admin added: "testuser-1"
3. Login with normal user 'testuser-1' and check Networking -> Routes -> alertmanager-main -> Metrics page again

Actual results:

3. empty page is returned

Expected results:

3. If normal user doesn't have ability to view Route Metrics, we should better either hide 'Metrics' tab or show an error message instead of totally empty page

Additional info:

https://github.com/openshift/console/pull/12944

Bug OCPBUGS-16692: Installer panics when controlPlane.platform is empty in install-config.yaml

View the Description View the linked PRs

Description of problem:

When running the installer on OSP with:

[...]
controlPlane:
  name: master
  platform: {}
  replicas: 3
[...]

in the install-config.yaml, it panics:

DEBUG OpenShift Installer 4.14.0-0.nightly-2023-07-20-215234
DEBUG Built from commit 1e9209ac80ed2cb4ba5663f519e51161a1d8858a
DEBUG Fetching Metadata...
DEBUG Loading Metadata...
DEBUG   Loading Cluster ID...
DEBUG     Loading Install Config...
DEBUG       Loading SSH Key...
DEBUG       Loading Base Domain...
DEBUG         Loading Platform...
DEBUG       Loading Cluster Name...
DEBUG         Loading Base Domain...
DEBUG         Loading Platform...
DEBUG       Loading Networking...
DEBUG         Loading Platform...
DEBUG       Loading Pull Secret...
DEBUG       Loading Platform...
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x3956f6d]goroutine 1 [running]:
github.com/openshift/installer/pkg/types/conversion.convertOpenStack(0xc000464dc0)
        /go/src/github.com/openshift/installer/pkg/types/conversion/installconfig.go:172 +0x1cd
github.com/openshift/installer/pkg/types/conversion.ConvertInstallConfig(0xc000464dc0)
        /go/src/github.com/openshift/installer/pkg/types/conversion/installconfig.go:47 +0x2af
github.com/openshift/installer/pkg/asset/installconfig.(*AssetBase).LoadFromFile(0xc000a18180, {0x20f8c650?, 0xc000696b40?})                                                                                                                 
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/installconfigbase.go:64 +0x32b
github.com/openshift/installer/pkg/asset/installconfig.(*InstallConfig).Load(0xc000a18180, {0x20f8c650?, 0xc000696b40?})                                                                                                                     
        /go/src/github.com/openshift/installer/pkg/asset/installconfig/installconfig.go:118 +0x2e
github.com/openshift/installer/pkg/asset/store.(*storeImpl).load(0xc0008f3f20, {0x20f95950, 0xc0002f9a40}, {0xc000af060c, 0x4})                                                                                                              
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:263 +0x35f
github.com/openshift/installer/pkg/asset/store.(*storeImpl).load(0xc0008f3f20, {0x20f95920, 0xc00040cf60}, {0x819d89a, 0x2})                                                                                                                 
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:246 +0x256
github.com/openshift/installer/pkg/asset/store.(*storeImpl).load(0xc0008f3f20, {0x7fed58b9ec98, 0x25ba8530}, {0x0, 0x0})                                                                                                                     
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:246 +0x256
github.com/openshift/installer/pkg/asset/store.(*storeImpl).fetch(0xc0008f3f20, {0x7fed58b9ec98, 0x25ba8530}, {0x0, 0x0})                                                                                                                    
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:200 +0x1a9
github.com/openshift/installer/pkg/asset/store.(*storeImpl).Fetch(0x7ffd6b4992ff?, {0x7fed58b9ec98, 0x25ba8530}, {0x25b8ea80, 0x8, 0x8})                                                                                                     
        /go/src/github.com/openshift/installer/pkg/asset/store/store.go:76 +0x48
main.runTargetCmd.func1({0x7ffd6b4992ff, 0x6})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:260 +0x126
main.runTargetCmd.func2(0x25b96920?, {0xc0002f8100?, 0x4?, 0x4?})
        /go/src/github.com/openshift/installer/cmd/openshift-install/create.go:290 +0xe7
github.com/spf13/cobra.(*Command).execute(0x25b96920, {0xc0002f80c0, 0x4, 0x4})
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:920 +0x847
github.com/spf13/cobra.(*Command).ExecuteC(0xc000a0c000)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:1040 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
        /go/src/github.com/openshift/installer/vendor/github.com/spf13/cobra/command.go:968
main.installerMain()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:61 +0x2b0
main.main()
        /go/src/github.com/openshift/installer/cmd/openshift-install/main.go:38 +0xff

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1. Create the install-config.yaml with an empty controlPlane.platform
2. Run the installer

Actual results:

Panic

Expected results:

Controlled error message if the platform is strictly necessary, otherwise a successful installation.

Additional info:

https://github.com/openshift/installer/pull/7363

Bug OCPBUGS-9174: cluster-readers role is not authorized to view NetworkAttachmentDefinition

View the Description View the linked PRs

Description of problem:
An un-privileged user with cluster-readers role cannot view NetworkAttachmentDefinition resource.

Version-Release number of selected component (if applicable):
oc Version: 4.10.0-202203141248.p0.g6db43e2.assembly.stream-6db43e2
OCP Version: 4.10.4
Kubernetes Version: v1.23.3+e419edf
ose-multus-cni:v4.1.0-7.155662231

How reproducible:
100%

Steps to Reproduce:
1. In an OCP cluster with multus installed - search which roles can view ("get") NetworkAttachmentDefinition resource, and see if "cluster-readers" role is part of this list, by running:
$ oc adm policy who-can get network-attachment-definitions | grep "cluster-reader"

Actual results:
Empty output

Expected results:
Non-empty output with "cluster-readers" in it, e.g. when running the same command for the Namespace resource:
$ oc adm policy who-can get namespace | grep "cluster-reader"
system:cluster-readers

https://github.com/openshift/cluster-network-operator/pull/1343

Bug OCPBUGS-10362: revert "force cert rotation every couple days for development" in 4.14

View the Description View the linked PRs

Description of problem:

revert "force cert rotation every couple days for development" in 4.13

Below is the steps to verify this bug:

# oc adm release info --commits registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-06-25-081133|grep -i cluster-kube-apiserver-operator
  cluster-kube-apiserver-operator                https://github.com/openshift/cluster-kube-apiserver-operator                7764681777edfa3126981a0a1d390a6060a840a3

# git log --date local --pretty="%h %an %cd - %s" 776468 |grep -i "#1307"
08973b820 openshift-ci[bot] Thu Jun 23 22:40:08 2022 - Merge pull request #1307 from tkashem/revert-cert-rotation

# oc get clusterversions.config.openshift.io 
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-25-081133   True        False         64m     Cluster version is 4.11.0-0.nightly-2022-06-25-081133

$ cat scripts/check_secret_expiry.sh
FILE="$1"
if [ ! -f "$1" ]; then
  echo "must provide \$1" && exit 0
fi
export IFS=$'\n'
for i in `cat "$FILE"`
do
  if `echo "$i" | grep "^#" > /dev/null`; then
    continue
  fi
  NS=`echo $i | cut -d ' ' -f 1`
  SECRET=`echo $i | cut -d ' ' -f 2`
  rm -f tls.crt; oc extract secret/$SECRET -n $NS --confirm > /dev/null
  echo "Check cert dates of $SECRET in project $NS:"
  openssl x509 -noout --dates -in tls.crt; echo
done

$ cat certs.txt
openshift-kube-controller-manager-operator csr-signer-signer
openshift-kube-controller-manager-operator csr-signer
openshift-kube-controller-manager kube-controller-manager-client-cert-key
openshift-kube-apiserver-operator aggregator-client-signer
openshift-kube-apiserver aggregator-client
openshift-kube-apiserver external-loadbalancer-serving-certkey
openshift-kube-apiserver internal-loadbalancer-serving-certkey
openshift-kube-apiserver service-network-serving-certkey
openshift-config-managed kube-controller-manager-client-cert-key
openshift-config-managed kube-scheduler-client-cert-key
openshift-kube-scheduler kube-scheduler-client-cert-key

Checking the Certs,  they are with one day expiry times, this is as expected.
# ./check_secret_expiry.sh certs.txt
Check cert dates of csr-signer-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:41:38 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of csr-signer in project openshift-kube-controller-manager-operator:
notBefore=Jun 27 04:52:21 2022 GMT
notAfter=Jun 28 04:41:38 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-kube-controller-manager:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of aggregator-client-signer in project openshift-kube-apiserver-operator:
notBefore=Jun 27 04:41:37 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of aggregator-client in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jun 28 04:41:37 2022 GMT

Check cert dates of external-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of internal-loadbalancer-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:49 2022 GMT
notAfter=Jul 27 04:52:50 2022 GMT

Check cert dates of service-network-serving-certkey in project openshift-kube-apiserver:
notBefore=Jun 27 04:52:28 2022 GMT
notAfter=Jul 27 04:52:29 2022 GMT

Check cert dates of kube-controller-manager-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:26 2022 GMT
notAfter=Jul 27 04:52:27 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-config-managed:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT

Check cert dates of kube-scheduler-client-cert-key in project openshift-kube-scheduler:
notBefore=Jun 27 04:52:47 2022 GMT
notAfter=Jul 27 04:52:48 2022 GMT
# 

# cat check_secret_expiry_within.sh
#!/usr/bin/env bash
# usage: ./check_secret_expiry_within.sh 1day # or 15min, 2days, 2day, 2month, 1year
WITHIN=${1:-24hours}
echo "Checking validity within $WITHIN ..."
oc get secret --insecure-skip-tls-verify -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | . != null and fromdateiso8601<='$( date --date="+$WITHIN" +%s )') | "\(.metadata.annotations."auth.openshift.io/certificate-not-before")  \(.metadata.annotations."auth.openshift.io/certificate-not-after")  \(.metadata.namespace)\t\(.metadata.name)"'

# ./check_secret_expiry_within.sh 1day
Checking validity within 1day ...
2022-06-27T04:41:37Z  2022-06-28T04:41:37Z  openshift-kube-apiserver-operator	aggregator-client-signer
2022-06-27T04:52:26Z  2022-06-28T04:41:37Z  openshift-kube-apiserver	aggregator-client
2022-06-27T04:52:21Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer
2022-06-27T04:41:38Z  2022-06-28T04:41:38Z  openshift-kube-controller-manager-operator	csr-signer-signer

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1545

Bug OCPBUGS-13093: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/17

Bug OCPBUGS-22127: Azure Image Registry Operator Making too Many Storage Account List Calls

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18469~~. The following is the description of the original issue:
—
Description of problem:

The image registry operator in Azure by default has two replicas.  Every 5 minutes, each of those replicas makes a call to the StorageAccount List operation for the image registry storage account.  

Azure has published limits for storage account throttling operations.  These limits are 100 calls to list operations every 5 minutes based on the subscription & region pair that exists. 

Because of this, customers are limited to <50 clusters per subscription and region in Azure.  This number can change based on the number of image registry replicas as well as customer activity on List storage account operations within that subscription and region.  

On Azure Red Hat OpenShift managed service, we occasionally have customers exceeding these limits including internal customers for demos, preventing them from creating new clusters within the subscription & region due to these scaling limits.

Version-Release number of selected component (if applicable):

N/A

How reproducible:

Always.

Steps to Reproduce:

1. Scale up the number of image registry pods to hit the 100 / 5 minute List limit (50 replicas, or enough clusters within a given subscription & region)
2. Attempt to create a new cluster
3. Cluster installation may fail due to image-registry cluster operator never going healthy, or the installer not being able to generate a storage account key for the bootstrap node to fetch its ignition config.

Actual results:

storage.AccountsClient#ListAccountSAS: Failure responding to request: StatusCode=429 -- Original Error: autorest/azure: Service returned an error. Status=429 Code="TooManyRequests" Message="The request is being throttled as the limit has been reached for operation type - Read_ObservationWindow_00:05:00. For more information, see - https://aka.ms/srpthrottlinglimits"

Expected results:

Cluster installs successfully

Additional info:

Raising this as a bug since this issue will be persistent across all cluster installations should one exceed the threshold.  It will also impact the image-registry pod health.

https://github.com/openshift/cluster-image-registry-operator/pull/941

Bug OCPBUGS-2474: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1561

Task MGMT-14648: "sufficient-masters-count' failed" subsystem test is intermittently failing

View the Description View the linked PRs

The "sufficient-masters-count' failed" test is intermittently failing due to a suspected race condition that causes as duplicate cluster event.

"Cluster validation 'sufficient-masters-count' that used to succeed is now failing"

The aim of this ticket is to ensure that this test does not flake

https://github.com/openshift/assisted-service/pull/5223

Bug OCPBUGS-10695: Dual-stack IPI installation fails when configuring multiple interfaces on nodes

View the Description View the linked PRs

Description of problem:

During cluster installation if the host systems had multiple dual-stack interfaces configured via install-config.yaml, the installation will fail. Notably, when a single-stack ipv4 installation is attempted with multiple interfaces it is successful. Additionally, when a dual-stack installation is attempted with only a single interface it is successful.

Version-Release number of selected component (if applicable):

Reproduced on 4.12.1 and 4.12.7

How reproducible:

100%

Steps to Reproduce:

1. Assign an IPv4 and an IPv6 address to both the apiVIPs and ingressVIPs parameters in the install-config.yaml
2. Configure all hosts with at least two interfaces in the install-config.yaml
3. Assign an IPv4 and an IPv6 address to each interface in the install-config.yaml
4. Begin cluster installation and wait for failure

Actual results:

Failed cluster installation

Expected results:

Successful cluster installation

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/227

Bug OCPBUGS-10899: Bootstrap etcd pod should use node name in bootstrap-in-place mode

View the Description View the linked PRs

Description of problem:

Usually etcd pod is named "etcd-bootstrap" for multinode install. In bootstrap-in-place mode the only master is not started during bootstrap, so its useful to use the expected pod name during bootstrap. This would allow us to re-use the bootstrap-generated certificates on "real" master startup

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1035

Bug OCPBUGS-21221: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/etcd/pull/222

Bug OCPBUGS-18442: MCO is degraded if not install image registry operator

View the Description View the linked PRs

Description of problem:

MCO depends on image registry, if not install image registry, installation will failed due to mco going to degraded

Version-Release number of selected component (if applicable):

payload image built from https://github.com/openshift/installer/pull/7421

How reproducible:

always

Steps to Reproduce:

1.Set "baselineCapabilitySet: None" when install a cluster, all the optional operators will not be installed.
2.
3.

Actual results:

09-01 15:50:34.770  level=error msg=Cluster operator machine-config Degraded is True with RenderConfigFailed: Failed to resync 4.14.0-0.ci.test-2023-08-31-033001-ci-ln-7xhl7yt-latest because: clusteroperators.config.openshift.io "image-registry" not found
09-01 15:50:34.770  level=error msg=Cluster operator machine-config Available is False with RenderConfigFailed: Cluster not available for [{operator 4.14.0-0.ci.test-2023-08-31-033001-ci-ln-7xhl7yt-latest}]: clusteroperators.config.openshift.io "image-registry" not found
09-01 15:50:34.770  level=info msg=Cluster operator network ManagementStateDegraded is False with : 
09-01 15:50:34.770  level=error msg=Cluster initialization failed because one or more operators are not functioning properly.

Expected results:

MCO should not be degraded if image registry is not installed

Additional info:

must-gather log https://drive.google.com/file/d/1E3FbPcVwZxBi33tHq7pyaHc8EM3eiTUa/view?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/3901

Bug OCPBUGS-20710: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/942

Bug OCPBUGS-24596: On an SNO the new CA certificate is not loaded after updating user-ca-bundle configmap

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24035~~. The following is the description of the original issue:
—
Description of problem:

     
  On an SNO a new CA certificate is not loaded after updating user-ca-bundle
 configmap and as a result the cluster cannot pull images from a 
registry with a certificate signed by the new CA.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Update ca bundle.crt replace with a new certificate if applicable )      in `user-ca-bundle` configmap under openshift-config namespace : 
  * On the node ensure that /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt was updated with the new certificate 
     2. Create a pod which uses an image from a registry that has its certificate signed by the new CA cert provided in ca-bundle.crt 
     3.

Actual results:

    Pod fails to pull image
 *** Failed to pull image "registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/centos/centos:8": rpc error: {  code  = Unknown desc = pinging container registry registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com : 5000: Get "https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/v2/": tls: failed to vierify certificate: x509: certificate signed by unknown authority 
  * On the node try to reach the registry via curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
** certificate validation fails: curl [https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/] 
 curl: (60) SSL certificate problem: self-signed certificate 
 More details here: [https://curl.se/docs/sslcerts.html] 

 To be able to create a pod I had to 
  ** Run `sudo update-ca-trust`. After that curl [https//registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000|https://registry.ztp-hub-00.mobius.lab.eng.rdu2.redhat.com:5000/]
 worked without issues but the pod creation still fails due to tls: 
failed to verify certificate: x509: certificate signed by unknown 
authority error 
  ** Run `sudo systemctl restart crio`. After that the pod creation succeeded and could pull the image

Expected results:

Additional info:

Attaching must gather

https://github.com/openshift/machine-config-operator/pull/4063

Bug OCPBUGS-27013: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2195

Bug OCPBUGS-7675: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/277

Bug OCPBUGS-15331: Failing to reconcile kube-apiserver advertisementAddress

View the Description View the linked PRs

Description of problem:

Once the https://issues.redhat.com/browse/OCPBUGS-14783 is fixed we found another issue which prevents the KubeapiServer's init-container to finish successful. The init-container tries to reach the Kubeapiserver in a ipv4 based url and that's not up, it should go to the IPv6 one.

Slack thread: https://redhat-internal.slack.com/archives/C058TF9K37Z/p1687445369492779
Log traces are in a image in the thread

https://github.com/openshift/hypershift/pull/2779

Bug OCPBUGS-16166: Bump to kubernetes 1.27.4

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.4:

Changelog:
v1.27.4: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1273

Task MGMT-15344: Assisted-controller should not timeout on waiting cvo by itself

View the Description View the linked PRs

Controller should wait till service will timeout on cvo and not timeout by itself

https://github.com/openshift/assisted-installer/pull/688

Bug OCPBUGS-11099: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12724

Bug OCPBUGS-14026: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/758

Bug OCPBUGS-16790: The file permissions of /var/run/multus/cni/net.d/*.conf on nodes should be updated to 600 to conform with CIS benchmarks

View the Description View the linked PRs

Description of problem:

Observation from CISv1.4 pdf:
1.1.9 Ensure that the Container Network Interface file permissions are set to 600 or more restrictive
“Container Network Interface provides various networking options for overlay networking.
You should consult their documentation and restrict their respective file permissions to maintain the integrity of those files. Those files should be writable by only the administrators on the system.”
 
To conform with CIS benchmarksChange, the /var/run/multus/cni/net.d/*.conf files on nodes should be updated to 600.

$ for i in $(oc get pods -n openshift-multus -l app=multus -oname); do oc exec -n openshift-multus $i -- /bin/bash -c "stat -c \"%a %n\" /host/var/run/multus/cni/net.d/*.conf"; done
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf
644 /host/var/run/multus/cni/net.d/80-openshift-network.conf

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

The file permissions of /var/run/multus/cni/net.d/*.conf on nodes is 644.

Expected results:

The file permissions of /var/run/multus/cni/net.d/*.conf on nodes should be updated to 600

Additional info:

https://github.com/openshift/sdn/pull/570

Bug OCPBUGS-18306: `useDeleteModal` example is not formatted correctly on https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#example-46

View the Description View the linked PRs

`useDeleteModal` example is not formatted correctly on https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs/api.md#example-46 as it is missing the wrapping "```tsx" and "```" markdown.

https://github.com/openshift/console/pull/13117

Bug OCPBUGS-20064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2050

Bug OCPBUGS-26559: 4.14-fast ARO after upgrade to 4.14 new Machinesets do not get worker config

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26240~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25406. The following is the description of the original issue:
—
Description of problem:

On a 4.14.5-fast channel cluster in ARO after the upgrade when the customer tried to add a new node the Machine Config was not applied and the node never joined the pool. This happens for every node and can only be remediated by SRE not the customer.

Version-Release number of selected component (if applicable):

4.14.5 -candidate

How reproducible:

Every time a node is added to the cluster at version.

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade it to 4.14 along fast channel
    3. Add a node

Actual results:

 message: >-
        could not Create/Update MachineConfig: Operation cannot be fulfilled on
        machineconfigs.machineconfiguration.openshift.io
        "99-worker-generated-kubelet": the object has been modified; please
        apply your changes to the latest version and try again
      status: 'False'
      type: Failure
    - lastTransitionTime: '2023-11-29T17:44:37Z'

~~~

Expected results:

Node is created and configured correctly.

Additional info:

 MissingStaticPodControllerDegraded: static pod lifecycle failure - static pod: "kube-apiserver" in namespace: "openshift-kube-apiserver" for revision: 15 on node: "aro-cluster-REDACTED-master-0" didn't show up, waited: 4m45s

https://github.com/openshift/machine-config-operator/pull/4109

Bug OCPBUGS-10165: Update 4.14 ose-machine-api-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-operator/pull/1127

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-operator/pull/1127

Bug OCPBUGS-22253: pipe can hide errors when using ip command

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21876~~. The following is the description of the original issue:
—
Description of problem:

if pipefail is active in a bash script, the pipe ( | ) usage can hide the actual error of the ip command if it fails with exit code different from 1

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/411

Bug OCPBUGS-8628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-manila-operator/pull/173

Bug OCPBUGS-17860: Unnecessary SG opening 0.0.0.0/0 on OpenStack

View the Description View the linked PRs

Description of problem:

OpenStack features SG rules opening traffic from `0.0.0.0/0` on NodePorts. This was required for the OVN loadbalancers to work properly as they keep the source IP of the traffic when traffic reaches the LB members. This isn't needed anymore as in 4.14 OSASINFRA-3067 implemented and enabled `manage-security-groups` option on the cloud-provider-openstack, so that it will create and attach the proper SG on its own to make sure only necessary NodePorts are open.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Check for existence of rules opening traffic from 0.0.0.0/0 on the master and worker nodes.

Actual results:

Rules are still there.

Expected results:

Rules are not needed anymore.

Additional info:

https://github.com/openshift/installer/pull/7405

Story MGMT-14298: zVM: minimal-ISO is set as default in assisted installer

View the Description View the linked PRs

This issue is valid for UI and API.
For UI
If a new cluster is being created and s390x is selected as architecture, an error message pops up if next button is being pressed (all other necessary values are filed correctly):

"cannot use Minimal ISO because it's not compatible with the s390x architecture on version 4.13.0-rc.3-multi of OpenShift"

There is no workaround because the matching selection (full-iso or iPXE) could be set on addHosts Dialog.

For API
The infra env object could not be created if type is not set. The error message:
"cannot use Minimal ISO because it's not compatible with the s390x architecture on version 4.13.0-rc.3-multi of OpenShift"
is returned.

Workaround is to set image_type to "full-iso" during infra env creation.

For s390x architecture the default should be always full-iso.

https://github.com/openshift/assisted-service/pull/5136

Bug OCPBUGS-10622: TestNewAppRun unit test failing

View the Description View the linked PRs

Description of problem:

Unit test failing 

=== RUN   TestNewAppRunAll/app_generation_using_context_dir
    newapp_test.go:907: app generation using context dir: Error mismatch! Expected <nil>, got supplied context directory '2.0/test/rack-test-app' does not exist in 'https://github.com/openshift/sti-ruby'
    --- FAIL: TestNewAppRunAll/app_generation_using_context_dir (0.61s)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

see for example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/1376/pull-ci-openshift-oc-master-images/1638172620648091648

Actual results:

unit tests fail

Expected results:

TestNewAppRunAll unit test should pass

Additional info:

https://github.com/openshift/oc/pull/1377

Bug OCPBUGS-14714: Cluster network operator in IBM ROKS is crashlooping

View the Description View the linked PRs

Description of problem:

The cluster network operator crashes in an IBM ROKS with the following error:
2023-06-07T12:21:37.402285420-05:00 stderr F I0607 17:21:37.402108       1 log.go:198] Failed to render: failed to render multus admission controller manifests: failed to render file bindata/network/multus-admission-controller/admission-controller.yaml: failed to render manifest bindata/network/multus-admission-controller/admission-controller.yaml: template: bindata/network/multus-admission-controller/admission-controller.yaml:199:12: executing "bindata/network/multus-admission-controller/admission-controller.yaml" at <.HCPNodeSelector>: map has no entry for key "HCPNodeSelector"

Version-Release number of selected component (if applicable):

4.13.1

How reproducible:

Always

Steps to Reproduce:

1. Run a ROKS cluster with OCP 4.13.1
2.
3.

Actual results:

CNO crashes

Expected results:

CNO functions normally

Additional info:

ROKS worked ok with 4.13.0
This change was introduced in 4.13.1:
https://github.com/openshift/cluster-network-operator/pull/1802

https://github.com/openshift/cluster-network-operator/pull/1835

Bug OCPBUGS-21576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/route-controller-manager/pull/34

Bug OCPBUGS-12044: Update 4.14 cluster-etcd-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-etcd-operator/pull/1042

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-etcd-operator/pull/1047

Bug OCPBUGS-13308: Conditional update "unknown due to an evaluation failure: client-side throttling" message is not clear

View the Description View the linked PRs

Description of problem:

When updating s390x cluster from 4.10.35 to 4.11.34, i got following message in the UI:

Updating this cluster to 4.11.34 is supported, but not recommended as it might not be optimized for some components in this cluster.

Exposure to KeepalivedMulticastSkew is unknown due to an evaluation failure: client-side throttling: only 9m20.476632575s has elapsed since the last match call completed for this cluster condition backend; this cached cluster condition request has been queued for later execution
On OpenStack, oVirt, and vSphere infrastructure, updates to 4.11 can cause degraded cluster operators as a result of a multicast-to-unicast keepalived transition, until all nodes have updated to 4.11. https://access.redhat.com/solutions/7007826

As we discussed on Slack[1] message could be more user friendly, something like this[2]:

"Throttling risk evaluation, 2 risks to evaluate, next evaluation in 9m59s."

[1] https://redhat-internal.slack.com/archives/CEGKQ43CP/p1683621220358259
[2] https://redhat-internal.slack.com/archives/CEGKQ43CP/p1683643286581299?thread_ts=1683621220.358259&cid=CEGKQ43CP

Version-Release number of selected component (if applicable):

4.11.34

How reproducible:

Have a cluster on 4.10.35 or i guess any 4.10.z and update to 4.11.34

Steps to Reproduce:

1. Open webconsole
2. On the dashboard/Overview click on Update cluster
3. Change the channel to stable-4.11
4. Select new version and from the drop down menu click on Include supported but not recommended versions
5. Select 4.11.34
6. Message from the problem description appears

Actual results:

Unclear message

Expected results:

Clear message

https://github.com/openshift/cluster-version-operator/pull/955

Bug OCPBUGS-15095: kubvirt digest missing from 4.14 boot images

View the Description View the linked PRs

Description of problem:

kubevirt digest missing from RHCOS boot image

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Unable to create kubevirt cluster

Expected results:

Able to create kubevirt cluster

Additional info:

https://github.com/openshift/installer/pull/7254

Bug OCPBUGS-19636: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7521

Bug OCPBUGS-8692: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

View the Description View the linked PRs

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

multus-admission-controller
cloud-network-config-controller
ovnkube-master

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1728

Task MGMT-15213: temporary disable release-domain-name-resolved-correctly validation

View the Description View the linked PRs

There are few cases that this validation doesn't cover

proxy MGMT-15112
disconnected that use a mirror ~~MGMT-15056~~

The validation will be enabled with MGMT-15112

https://github.com/openshift/assisted-service/pull/5351

Bug OCPBUGS-10591: machine API operator failing with No Major.Minor.Patch elements found

View the Description View the linked PRs

Description of problem:

Starting with 4.12.0-0.nightly-2023-03-13-172313, the machine API operator began receiving an invalid version tag either due to a missing or invalid VERSION_OVERRIDE(https://github.com/openshift/machine-api-operator/blob/release-4.12/hack/go-build.sh#L17-L20) value being passed tot he build.

This is resulting in all jobs invoked by the 4.12 nightlies failing to install.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-03-13-172313 and later

How reproducible:

consistently in 4.12 nightlies only(ci builds do not seem to be impacted).

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Example of failure https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.12-e2e-aws-csi/1635331349046890496/artifacts/e2e-aws-csi/gather-extra/artifacts/pods/openshift-machine-api_machine-api-operator-866d7647bd-6lhl4_machine-api-operator.log

https://github.com/openshift/machine-api-operator/pull/1128

Bug OCPBUGS-13228: Update 4.14 atomic-openshift-cluster-autoscaler image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kubernetes-autoscaler/pull/255

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kubernetes-autoscaler/pull/255

Bug OCPBUGS-13788: MultiNetworkPolicy IPv4/IPv6 test broke the payload

View the Description View the linked PRs

Description of problem:

The following tests broke the payload for CI and nightly

[sig-network][Feature:MultiNetworkPolicy][Serial] should enforce a network policies on secondary network IPv6 [Suite:openshift/conformance/serial]

[sig-network][Feature:MultiNetworkPolicy][Serial] should enforce a network policies on secondary network IPv4 [Suite:openshift/conformance/serial]

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Test Panicked: runtime error: invalid memory address or nil pointer dereference

Expected results:

Additional info:

Original PR that broke the payload https://github.com/openshift/origin/pull/27795 

Revert to get payloads back to normal https://github.com/openshift/origin/pull/27926

Broken payloads and related jobs and sippy link for additional info

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-05-17-212447

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1659065324743430144

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-18-040905

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-serial/1659088328617627648
https://sippy.dptools.openshift.org/sippy-ng/tests/4.14?filters=%257B%2522items%2522%253A%255B%257B%2522columnField%2522%253A%2522current_runs%2522%252C%2522operatorValue%2522%253A%2522%253E%253D%2522%252C%2522value%2522%253A%25227%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522never-stable%2522%257D%252C%257B%2522columnField%2522%253A%2522variants%2522%252C%2522not%2522%253Atrue%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522aggregated%2522%257D%252C%257B%2522id%2522%253A99%252C%2522columnField%2522%253A%2522name%2522%252C%2522operatorValue%2522%253A%2522contains%2522%252C%2522value%2522%253A%2522%255Bsig-network%255D%255BFeature%253AMultiNetworkPolicy%255D%255BSerial%255D%2520should%2520enforce%2520a%2520network%2520policies%2520on%2520secondary%2520network%2520IPv6%2520%255BSuite%253Aopenshift%252Fconformance%252Fserial%255D%2522%257D%255D%252C%2522linkOperator%2522%253A%2522and%2522%257D&sort=asc&sortField=current_working_percentage

https://github.com/openshift/origin/pull/27927

Bug OCPBUGS-20755: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/50

Bug OCPBUGS-23111: Should reference configmaps instead of secrets

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23108~~. The following is the description of the original issue:
—
Description of problem:

Code calls secrets instead of configmaps

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/152

Bug OCPBUGS-20051: MCO does not create duplicated kernel arguments

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19708~~. The following is the description of the original issue:
—
Description of problem:

When we create a MC that declares the same kernel argument twice, MCO is adding it only once.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2023-09-22-181920   True        False         5h18m   Cluster version is 4.12.0-0.nightly-2023-09-22-181920

We have seen this behavior in 4.15 too 4.15.0-0.nightly-2023-09-22-224720

How reproducible:

Always

Steps to Reproduce:

1. Create a MC that declares 2 kernel arguments with the same value (z=4 is duplicated)

 apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-kernel-arguments-32-zparam
spec:
  config:
    ignition:
      version: 3.2.0
  kernelArguments:
    - y=0
    - z=4
    - y=1
    - z=4

Actual results:

We get the following parameters

$ oc debug -q node/sergio-v12-9vwrc-worker-c-tpbvh.c.openshift-qe.internal  -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/vmlinuz-4.18.0-372.73.1.el8_6.x86_64 ostree=/ostree/boot.0/rhcos/a594b3a14778ce39f2b42ddb90e933c1971268a746ef1678a3c6eedee5a21b00/0 ignition.platform.id=gcp console=ttyS0,115200n8 root=UUID=e101e976-e029-411d-ad71-6856f3838c4f rw rootflags=prjquota boot=UUID=75598fe5-c10d-4e95-9747-1708d9fe6a10 console=tty0 y=0 z=4 y=1

There is only one "z=4" parameter. We should see "y=0 z=4 y=1 z=4" instead of "y=0 z=4 y=1"

Expected results:

In older versions we can see that the duplicated parameters are created

For example, this is the output in a IPI on AWS 4.9 cluster

$ oc debug -q node/ip-10-0-189-69.us-east-2.compute.internal -- chroot /host cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/vmlinuz-4.18.0-305.97.1.el8_4.x86_64 random.trust_cpu=on console=tty0 console=ttyS0,115200n8 ostree=/ostree/boot.1/rhcos/e1eeff6ec1b9b70a3554779947906f4a7fb93e0d79fbefcb045da550b7d9227f/0 ignition.platform.id=aws root=UUID=ed307195-b5a9-4160-8a7a-df42aa734c28 rw rootflags=prjquota y=0 z=4 y=1 z=4


All the parameters are created, including the duplicated "z=4".

Additional info:

https://github.com/openshift/machine-config-operator/pull/3957

Bug OCPBUGS-22360: tokenConfig's accessTokenInactivityTimeout in hosted cluster is not consistent with management cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21626~~. The following is the description of the original issue:
—
Description: If tokenConfig.accessTokenInactivityTimeout set to less than 300s, the accessTokenInactivityTimeout doesn't work in hosted cluster whereas in Management cluster, we get below error while trying to set the timeout < 300s :

spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds*

Steps to reproduce the issue:

1. Install a fresh 4.15 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Wait for the oauth pods to redeploy and check the oauth cm for updated accessTokenInactivityTimeout value:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-xxxxx 
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...
4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a889<...>:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`

Actual result:

Wait for 100s and try login with the TOKEN
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a889<...>:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Expected result:

1. Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

2. In Management cluster, we get below error when trying to set the timeout to less than 300s :
spec.tokenConfig.accessTokenInactivityTimeout: Invalid value: v1.Duration{Duration:100000000000}: the minimum acceptable token timeout value is 300 seconds* 
Implement the same in hosted cluster.

https://github.com/openshift/hypershift/pull/3175

Bug OCPBUGS-8274: skip /api/request-token request when auth is disabled

View the Description View the linked PRs

Description of problem:

when run local bridge with auth disabled, we can see error
GET http://localhost:9000/api/request-token 404 (Not Found)

Version-Release number of selected component (if applicable):

latest master

How reproducible:

Always

Steps to Reproduce:

1. fetch latest openshift/console code and build
2. run local bridge './bin/bridge'
3.

Actual results:

visiting localhost:9000 we can see errors GET http://localhost:9000/api/request-token 404 (Not Found)

Expected results:

maybe we should skip /api/request-token request when auth is disabled, as suggested in https://github.com/openshift/console/pull/12553#discussion_r1103151813

Additional info:

Bug OCPBUGS-9685: fails to reconcile to RT kernel on interrupted updates

View the Description View the linked PRs

The aggregated https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.14-minor-release-openshift-release-analysis-aggregator/1633554110798106624 job failed. Digging into one of them:

This MCD log has https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1633554106595414016/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-p2vf4_machine-config-daemon.log

Deployments:
* ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                   Digest: sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                  Version: 413.86.202302230536-0 (2023-03-08T20:10:47Z)
      RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-372.43.1.el8_6
          LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                           kernel-rt-modules-extra
...
E0308 22:11:21.925030 74176 writer.go:200] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1: [0m[31merror: [0mImporting: remote error: fetching blob: received unexpected HTTP status: 500 Internal Server Error
... 
I0308 22:11:36.959143   74176 update.go:2010] Running: rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
...
E0308 22:12:35.525156   74176 writer.go:200] Marking Degraded due to: error running rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra: [0m[31merror: [0mPackage/capability 'kernel-rt-core' is not currently requested
: exit status 1

Something is going wrong here in our retry loop. I think it might be that we don't clear the pending deployment on failure. IOW we need to

rpm-ostree cleanup -p

before we rertry.

This is fallout from https://github.com/openshift/machine-config-operator/pull/3580 - Although I suspect it may have been an issue before too.

https://github.com/openshift/machine-config-operator/pull/3599

Bug OCPBUGS-12297: Update 4.14 ose-aws-ebs-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/aws-ebs-csi-driver/pull/220

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/aws-ebs-csi-driver/pull/223

Bug OCPBUGS-17669: Validate Cluster Name in HostedCluster Controller

View the Description View the linked PRs

Description of problem:

The HostedCluster name is not currently validated against RFC1123.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Any HostedCluster name is allowed

Expected results:

Only HostedCluster names meeting RFC1123 validation should be allowed.

Additional info:

https://github.com/openshift/hypershift/pull/2914

Bug OCPBUGS-21802: [4.14] CEO prevents member deletion during revision rollout

View the Description View the linked PRs

This is a clone of issue OCPBUGS-17199. The following is the description of the original issue:
—
this is case 2 from ~~OCPBUGS-14673~~

Description of problem:

MHC for control plane cannot work right for some cases

2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.

This is a regression bug, because I tested this on 4.12 around September 2022, case 2 and case 3 work right.
https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-54326

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-05-112833
4.13.0-0.nightly-2023-06-06-194351
4.12.0-0.nightly-2023-06-07-005319

How reproducible:

Always

Steps to Reproduce:

1.Create MHC for control plane

apiVersion: machine.openshift.io/v1beta1
kind: MachineHealthCheck
metadata:
  name: control-plane-health
  namespace: openshift-machine-api
spec:
  maxUnhealthy: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-machine-type: master
  unhealthyConditions:
  - status: "False"
    timeout: 300s
    type: Ready
  - status: "Unknown"
    timeout: 300s
    type: Ready


liuhuali@Lius-MacBook-Pro huali-test % oc create -f mhc-master3.yaml 
machinehealthcheck.machine.openshift.io/control-plane-health created
liuhuali@Lius-MacBook-Pro huali-test % oc get mhc
NAME                              MAXUNHEALTHY   EXPECTEDMACHINES   CURRENTHEALTHY
control-plane-health              1              3                  3
machine-api-termination-handler   100%           0                  0 

Case 2.Stop the kubelet service on the master node, new master get Running, the old one stuck in Deleting, many co degraded.
liuhuali@Lius-MacBook-Pro huali-test % oc debug node/huliu-az7c-svq9q-master-1 
Starting pod/huliu-az7c-svq9q-master-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.0.6
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-5.1# systemctl stop kubelet


Removing debug pod ...
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                                   STATUS   ROLES                  AGE   VERSION
huliu-az7c-svq9q-master-1              Ready    control-plane,master   95m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready    control-plane,master   95m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready    control-plane,master   19m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready    worker                 34m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready    worker                 47m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready    worker                 83m   v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE     TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Running   Standard_D8s_v3   westus          97m
huliu-az7c-svq9q-master-2              Running   Standard_D8s_v3   westus          97m
huliu-az7c-svq9q-master-c96k8-0        Running   Standard_D8s_v3   westus          23m
huliu-az7c-svq9q-worker-westus-5r8jf   Running   Standard_D4s_v3   westus          39m
huliu-az7c-svq9q-worker-westus-k747l   Running   Standard_D4s_v3   westus          53m
huliu-az7c-svq9q-worker-westus-r2vdn   Running   Standard_D4s_v3   westus          91m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                                   STATUS     ROLES                  AGE     VERSION
huliu-az7c-svq9q-master-1              NotReady   control-plane,master   107m    v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready      control-plane,master   107m    v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   32m     v1.26.5+7a891f0
huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   2m10s   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 46m     v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 59m     v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 95m     v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE      TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          110m
huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          110m
huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          36m
huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          5m55s
huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          52m
huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          65m
huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          103m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                   PHASE      TYPE              REGION   ZONE   AGE
huliu-az7c-svq9q-master-1              Deleting   Standard_D8s_v3   westus          3h
huliu-az7c-svq9q-master-2              Running    Standard_D8s_v3   westus          3h
huliu-az7c-svq9q-master-c96k8-0        Running    Standard_D8s_v3   westus          105m
huliu-az7c-svq9q-master-jdhgg-1        Running    Standard_D8s_v3   westus          75m
huliu-az7c-svq9q-worker-westus-5r8jf   Running    Standard_D4s_v3   westus          122m
huliu-az7c-svq9q-worker-westus-k747l   Running    Standard_D4s_v3   westus          135m
huliu-az7c-svq9q-worker-westus-r2vdn   Running    Standard_D4s_v3   westus          173m
liuhuali@Lius-MacBook-Pro huali-test % oc get node   
NAME                                   STATUS     ROLES                  AGE    VERSION
huliu-az7c-svq9q-master-1              NotReady   control-plane,master   178m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-2              Ready      control-plane,master   178m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-c96k8-0        Ready      control-plane,master   102m   v1.26.5+7a891f0
huliu-az7c-svq9q-master-jdhgg-1        Ready      control-plane,master   72m    v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-5r8jf   Ready      worker                 116m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-k747l   Ready      worker                 129m   v1.26.5+7a891f0
huliu-az7c-svq9q-worker-westus-r2vdn   Ready      worker                 165m   v1.26.5+7a891f0
liuhuali@Lius-MacBook-Pro huali-test % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
baremetal                                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
cloud-controller-manager                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      176m    
cloud-credential                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      3h      
cluster-autoscaler                         4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
config-operator                            4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
console                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      136m    
control-plane-machine-set                  4.13.0-0.nightly-2023-06-06-194351   True        False         False      71m     
csi-snapshot-controller                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
dns                                        4.13.0-0.nightly-2023-06-06-194351   True        True          False      173m    DNS "default" reports Progressing=True: "Have 6 available node-resolver pods, want 7."
etcd                                       4.13.0-0.nightly-2023-06-06-194351   True        True          True       173m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
image-registry                             4.13.0-0.nightly-2023-06-06-194351   True        True          False      165m    Progressing: The registry is ready...
ingress                                    4.13.0-0.nightly-2023-06-06-194351   True        False         False      165m    
insights                                   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
kube-apiserver                             4.13.0-0.nightly-2023-06-06-194351   True        True          True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-controller-manager                    4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-scheduler                             4.13.0-0.nightly-2023-06-06-194351   True        False         True       171m    NodeControllerDegraded: The master nodes not ready: node "huliu-az7c-svq9q-master-1" not ready since 2023-06-07 08:47:34 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)
kube-storage-version-migrator              4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
machine-api                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
machine-approver                           4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
machine-config                             4.13.0-0.nightly-2023-06-06-194351   False       False         True       60m     Cluster not available for [{operator 4.13.0-0.nightly-2023-06-06-194351}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 7, updated: 7, ready: 6, unavailable: 1)]
marketplace                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
monitoring                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      106m    
network                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      177m    DaemonSet "/openshift-network-diagnostics/network-check-target" is not available (awaiting 1 nodes)...
node-tuning                                4.13.0-0.nightly-2023-06-06-194351   True        False         False      173m    
openshift-apiserver                        4.13.0-0.nightly-2023-06-06-194351   True        True          True       107m    APIServerDeploymentDegraded: 1 of 4 requested instances are unavailable for apiserver.openshift-apiserver ()
openshift-controller-manager               4.13.0-0.nightly-2023-06-06-194351   True        False         False      170m    
openshift-samples                          4.13.0-0.nightly-2023-06-06-194351   True        False         False      167m    
operator-lifecycle-manager                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-06-06-194351   True        False         False      174m    
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-06-06-194351   True        False         False      168m    
service-ca                                 4.13.0-0.nightly-2023-06-06-194351   True        False         False      175m    
storage                                    4.13.0-0.nightly-2023-06-06-194351   True        True          False      174m    AzureDiskCSIDriverOperatorCRProgressing: AzureDiskDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods...
liuhuali@Lius-MacBook-Pro huali-test % 

-----------------------

There might be an easier way by just rolling a revision in etcd, stopping kubelet and then observing the same issue.

Actual results:

CEO's member removal controller is getting stuck on the IsBootstrapComplete check that was introduced to fix another bug: 

 https://github.com/openshift/cluster-etcd-operator/commit/c96150992a8aba3654835787be92188e947f557c#diff-d91047e39d2c1ab6b35e69359a24e83c19ad9b3e9ad4e44f9b1ac90e50f7b650R97 

 turns out IsBootstrapComplete checks whether a revision is currently rolling out (makes sense) and that one NotReady node with kubelet gone still has a revision going (rev 7, target 9).

more info: https://issues.redhat.com/browse/OCPBUGS-14673?focusedId=22726712&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-22726712

This causes the etcd member to not be removed. 

Which in turn blocks the vertical scale-down procedure to remove the pre-drain hook as the member is still present. Effectively you end up with a cluster of 4 CP machines, where one is stuck in Deleting state.

Expected results:

The etcd member should be removed and the machine/node should be deleted

Additional info:

Removing the revision check does fix this issue reliably, but might not be desirable:
https://github.com/openshift/cluster-etcd-operator/pull/1087

https://github.com/openshift/cluster-etcd-operator/pull/1138

Bug OCPBUGS-11385: DNS egress router should not run as privileged

View the Description View the linked PRs

Description of problem:

The DNS egress router must run as privileged. With it being just an haproxy, it doesn't make much sense.

If I am not wrong, the biggest reason to need privileged is because of {{chroot}} option inherited from default file (https://github.com/openshift/images/blob/master/egress/dns-proxy/egress-dns-proxy.sh#L44). That option doesn't make much sense when we are already inside a container (hence why ingress controllers don't use it, for example).

So it may be worth exploring if this option can be removed and the DNS egress router can be run without requiring privileged mode, but maybe just CAP_NET_BIND_SERVICE

Version-Release number of selected component (if applicable):

4.12.0

How reproducible:

Always

Steps to Reproduce:

1. Forget to set privileged mode in the container
2.
3.

Actual results:

Pod cannot start due to chroot setting. I need to run the container as privileged, which lowers security too much.

Expected results:

Run the container without being privileged, maybe adding CAP_NET_BIND_SERVICE.

Additional info:

https://github.com/openshift/images/pull/137

Bug OCPBUGS-12793: `oc new-app` does not propagate containerPort information to the deployment if import-mode is PreserveOriginal

View the Description View the linked PRs

When creating a deployment with `oc new-app` and using `--import-mode=PreserveOriginal`, if there are containerports that are present in the dockerfile, they do not get propagated to the deployment `spec.containers[i].ports[i].containerPort`.

On further inspection this is because the config object which gets passed from the image to the deployment does not contain these details. The image reference in this case is a manifestlisted image which does not contain the docker metadata. Instead these need to be derived from the child manifest.

https://github.com/openshift/oc/pull/1415

Bug OCPBUGS-14272: Race condition in TestMCDRotatesCertsOnPausedPool

View the Description View the linked PRs

Description of problem:

This test tends to be flakey; depending on how the cert changes are propagated. We rotate 2/7 certs in the bundle; if the changes don't get batched together, the assert to verify after the certs happens too soon causing the test to fail.

Version-Release number of selected component (if applicable):

4.14.0

https://github.com/openshift/machine-config-operator/pull/3718

Bug OCPBUGS-2130: [vsphere] zone cluster installation fails if vSphere Cluster is embedded in Folder

View the Description View the linked PRs

Description of problem:

create new host and cluster folder qe-cluster under datacenter, and move cluster workloads into that folder.

$ govc find -type r
/OCP-DC/host/qe-cluster/workloads

using below install-config.yaml file to create single zone cluster.

apiVersion: v1
baseDomain: qe.devcluster.openshift.com
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform: 
    vsphere:
      cpus: 4
      memoryMB: 8192
      osDisk:
        diskSizeGB: 60
      zones:
        - us-east-1
  replicas: 2
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    vsphere: 
      cpus: 4
      memoryMB: 16384 
      osDisk:
        diskSizeGB: 60
      zones:
        - us-east-1
  replicas: 3
metadata:
  name: jima-permission
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.19.46.0/24
  networkType: OVNKubernetes
  serviceNetwork:
  - 172.30.0.0/16
platform:
  vsphere:
    apiVIP: 10.19.46.99
    cluster: qe-cluster/workloads
    datacenter: OCP-DC
    defaultDatastore: my-nfs
    ingressVIP: 10.19.46.98
    network: "VM Network"
    username: administrator@vsphere.local
    password: xxx
    vCenter: xxx
    vcenters:
    - server: xxx
      user: administrator@vsphere.local
      password: xxx
      datacenters:
      - OCP-DC
    failureDomains:
    - name: us-east-1
      region: us-east
      zone: us-east-1a
      topology:
        datacenter: OCP-DC
        computeCluster: /OCP-DC/host/qe-cluster/workloads
        networks:
        - "VM Network"
        datastore: my-nfs
      server: xxx
pullSecret: xxx

installer get error:

$ ./openshift-install create cluster --dir ipi5 --log-level debug
DEBUG   Generating Platform Provisioning Check...  
DEBUG   Fetching Common Manifests...               
DEBUG   Reusing previously-fetched Common Manifests 
DEBUG Generating Terraform Variables...            
FATAL failed to fetch Terraform Variables: failed to generate asset "Terraform Variables": failed to get vSphere network ID: could not find vSphere cluster at /OCP-DC/host//OCP-DC/host/qe-cluster/workloads: cluster '/OCP-DC/host//OCP-DC/host/qe-cluster/workloads' not found

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-05-053337

How reproducible:

always

Steps to Reproduce:

1. create new host/cluster folder under datacenter, and move vsphere cluster into that folder
2. prepare install-config with zone configuration
3. deploy cluster

Actual results:

fail to create cluster

Expected results:

succeed to create cluster

Additional info:

https://github.com/openshift/installer/pull/6973

Bug OCPBUGS-10269: Fix grammatical error in feedback modal

View the Description View the linked PRs

Description of problem:

Fix grammatical error in feedback modal. Remove 'the' before openshift text.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12634

Bug OCPBUGS-10807: multus-admission-controller should not run as root under Hypershift-managed CNO

View the Description View the linked PRs

Description of problem:

Cluster Network Operator managed component multus-admission-controller does not conform to Hypershift control plane expectations.

When CNO is managed by Hypershift, multus-admission-controller and other CNO-managed deployments should run with non-root security context. If Hypershift runs control plane on kubernetes (as opposed to Openshift) management cluster, it adds pod security context to its managed deployments, including CNO, with runAsUser element inside. In such a case CNO should do the same, set security context for its managed deployments, like multus-admission-controller, to meet Hypershift security rules.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift using Kube management cluster
2.Check pod security context of multus-admission-controller

Actual results:

no pod security context is set on multus-admission-controller

Expected results:

pod security context is set with runAsUser: xxxx

Additional info:

Corresponding CNO change

https://github.com/openshift/hypershift/pull/2319

Bug OCPBUGS-1117: The oc binary stored at /usr/local/bin in the cli-artifacts image of a non-amd64 payload is the amd64one

View the Description View the linked PRs

Description of problem:

The oc binary stored at /usr/local/bin in the cli-artifacts image of a non-amd64 payload is not the one for the architecture bound to the payload. It is an amd64 binary.

Version-Release number of selected component (if applicable):

4.11.4

How reproducible:

always

Steps to Reproduce:

1. CLI_ARTIFACTS_IMAGE=$(oc adm release info quay.io/openshift-release-dev/ocp-release:4.11.4-aarch64 --image-for=cli-artifacts)
2. CONTAINER=$(podman create $CLI_ARTIFACTS_IMAGE)
3. podman cp $CONTAINER:/usr/bin/oc /tmp/oc
4. file /tmp/oc

Actual results:

/tmp/oc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked,.....

Expected results:

It should be a binary bound to the architecture for which the image is built. i.e., using the above aarch64 payload should lead to an arm64 binary at /usr/bin and the other arches bins in /usr/share/openshift

Additional info:

https://github.com/openshift/oc/blob/master/images/cli-artifacts/Dockerfile.rhel#L13

https://github.com/openshift/oc/pull/1374

Bug OCPBUGS-15558: [sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured fails because of timeout issue

View the Description View the linked PRs

Description of problem:

The aforementioned test in the e2e origin test suite sometimes fails because it can't connect to the API endpoint.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Sometimes

Steps to Reproduce:

1. See https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-ovn-upgrade/1673703516675248128
2.
3.

Actual results:

The test failed.

Expected results:

The test should retry a couple of times with a delay when it didn't get an HTTP response from the endpoint (e.g. connection issue).

Additional info:

https://github.com/openshift/origin/pull/28010

Bug OCPBUGS-16498: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/66

Bug OCPBUGS-18568: Wrong image override is used in ignition server

View the Description View the linked PRs

Description of problem:

Currently, we unconditionally use an image mapping from the management
cluster if a mapping exists for ocp-release-dev or ocp/release.
When the individual images do not use those registries, the wrong
mapping is used.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Create an ICSP on a management cluster:

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: image-policy-39
spec:
  repositoryDigestMirrors:
  - mirrors:
    - quay.io/openshift-release-dev/ocp-release
    - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
    source: quay.io/openshift-release-dev/ocp-release

2. Create a HostedCluster that uses a CI release

Actual results:

Nodes never join because ignition server is looking up the wrong image for the CCO and MCO.

Expected results:

Nodes can join the cluster.

Additional info:

https://github.com/openshift/hypershift/pull/2985

Bug OCPBUGS-20829: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/608

Bug OCPBUGS-22959: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3165

Bug OCPBUGS-21255: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/38

Bug OCPBUGS-26309: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/163

Bug OCPBUGS-12098: Update 4.14 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/47

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/47

Bug OCPBUGS-17711: Regression: oc adm extract fails on invalid KUBECONFIG

View the Description View the linked PRs

Description of problem:

We suspect that https://github.com/openshift/oc/pull/1521 has broken all Metal jobs, an example of a failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-baremetal-operator/355/pull-ci-openshift-cluster-baremetal-operator-master-e2e-metal-ipi-ovn-ipv6/1691359315740332032.

~~Details:~~

The testing scripts we use set KUBECONFIG in advance to the location where we'll create it. At the time "oc adm extract" is called, the file does not exist yet. While you could argue that we should not do it, it has worked for years, and it's quite possible that customers have similar automation (e.g. setting KUBECONFIG as a global variable in their playbooks). In any case, I don't think "oc adm extract" should try to read the configuration if it does not explicitly need it.

Updated details:

After the change, "oc adm extract" expects KUBECONFIG to be present, but at the point when we call it, there is no cluster. I initially assumed that unsetting KUBECONFIG will help but it does not.

https://github.com/openshift/oc/pull/1527

Bug OCPBUGS-22718: previously disabled cluster capability Console unintentionally enabled during an upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20331~~. The following is the description of the original issue:
—
Description of problem:

a 4.13 cluster installed with
baselineCapabilitySet: None
additionalEnabledCapabilities: ['NodeTuning', 'CSISnapshot']

an upgrade to 4.14 causing a previously disabled Console to became ImplicitlyEnabled (in contrast with newly added 4.14 capabilities that are expected to be enabled implicitly in this case)

'ImplicitlyEnabledCapabilities'
{
  "lastTransitionTime": "2023-10-09T19:08:29Z",
  "message": "The following capabilities could not be disabled: Console, ImageRegistry, MachineAPI",
  "reason": "CapabilitiesImplicitlyEnabled",
  "status": "True",
  "type": "ImplicitlyEnabledCapabilities"
}

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

100%

Steps to Reproduce:

as described above

Additional info:

the root cause appears to be https://github.com/openshift/cluster-kube-apiserver-operator/pull/1542

more info in https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1696940380413289

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1570

Bug OCPBUGS-2960: Master stuck in a creating/deleting loop when drop vmsize field from the CPMS providerSpec

View the Description View the linked PRs

Description of problem:

On Azure when drop vmsize or location field from cpms's providerSpec, a master will be in a creating/deleting loop.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-25-210451

How reproducible:

always

Steps to Reproduce:

1. Create an Azure cluster with a CPMS
2. Activate the CPMS
3. Drop the vmsize field from the providerSpec

Actual results:

New machine is created, deleted, created, deleted ...
$ oc get machine         
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsuncpms1-7svhz-master-0               Running    Standard_D8s_v3   eastus   2      3h21m
zhsuncpms1-7svhz-master-1               Running    Standard_D8s_v3   eastus   3      3h21m
zhsuncpms1-7svhz-master-2               Running    Standard_D8s_v3   eastus   1      3h21m
zhsuncpms1-7svhz-master-l489k-0         Deleting                                     0s
zhsuncpms1-7svhz-worker-eastus1-6vsl4   Running    Standard_D4s_v3   eastus   1      3h16m
zhsuncpms1-7svhz-worker-eastus2-dpvp9   Running    Standard_D4s_v3   eastus   2      3h16m
zhsuncpms1-7svhz-worker-eastus3-sg7dx   Running    Standard_D4s_v3   eastus   3      19m
$ oc get machine  
NAME                                    PHASE     TYPE              REGION   ZONE   AGE
zhsuncpms1-7svhz-master-0               Running   Standard_D8s_v3   eastus   2      3h26m
zhsuncpms1-7svhz-master-1               Running   Standard_D8s_v3   eastus   3      3h26m
zhsuncpms1-7svhz-master-2               Running   Standard_D8s_v3   eastus   1      3h26m
zhsuncpms1-7svhz-master-wmnfq-0                                                     1s
zhsuncpms1-7svhz-worker-eastus1-6vsl4   Running   Standard_D4s_v3   eastus   1      3h21m
zhsuncpms1-7svhz-worker-eastus2-dpvp9   Running   Standard_D4s_v3   eastus   2      3h21m
zhsuncpms1-7svhz-worker-eastus3-sg7dx   Running   Standard_D4s_v3   eastus   3      24m

$ oc get controlplanemachineset   
NAME      DESIRED   CURRENT   READY   UPDATED   UNAVAILABLE   STATE    AGE
cluster   3         4         3                               Active   25m
$ oc get co control-plane-machine-set      
NAME                        VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
control-plane-machine-set   4.12.0-0.nightly-2022-10-25-210451   True        True          False      4h38m   Observed 3 replica(s) in need of update

Expected results:

Errors are logged and no machine is created or new machine could be created successful.

Additional info:

Drop vmSize, we can create new machine, seems default value is Standard_D4s_v3, but don't allow update.
$ oc get machine                
NAME                                      PHASE         TYPE              REGION   ZONE   AGE
zhsunazure11-cdbs8-master-0               Running       Standard_D8s_v3   eastus   2      4h7m
zhsunazure11-cdbs8-master-000             Provisioned   Standard_D4s_v3   eastus   2      48s
zhsunazure11-cdbs8-master-1               Running       Standard_D8s_v3   eastus   3      4h7m
zhsunazure11-cdbs8-master-2               Running       Standard_D8s_v3   eastus   1      4h7m
zhsunazure11-cdbs8-worker-eastus1-5v66l   Running       Standard_D4s_v3   eastus   1      4h1m
zhsunazure11-cdbs8-worker-eastus1-test    Running       Standard_D4s_v3   eastus   1      7m45s
zhsunazure11-cdbs8-worker-eastus2-hm9bm   Running       Standard_D4s_v3   eastus   2      4h1m
zhsunazure11-cdbs8-worker-eastus3-7j9kf   Running       Standard_D4s_v3   eastus   3      4h1m

$ oc edit machineset zhsuncpms1-7svhz-worker-eastus3         
error: machinesets.machine.openshift.io "zhsuncpms1-7svhz-worker-eastus3" could not be patched: admission webhook "validation.machineset.machine.openshift.io" denied the request: providerSpec.vmSize: Required value: vmSize should be set to one of the supported Azure VM sizes

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/175

Bug OCPBUGS-18046: govc version need to be updated in the installer image

View the Description View the linked PRs

Description of problem:

we need update the govc version to support PR:https://github.com/openshift/release/pull/42334.
As the command "govc vm.network.change -dc xxx  -vm -net xxxxx " only support after govc version v0.30.4. then vm can not fetch ip correctly.

Version-Release number of selected component (if applicable):

ocp 4.14

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

"govc: path 'ci-segment-151'" resolves to multiple networks
if specific the -net with network path, will got "govc: network '/IBMCloud/host/vcs-mdcnc-workload-1/ci-segment-151' not found"

Expected results:

govc version update, govc vm.network.change can be used to get the unique network.

Additional info:

https://github.com/openshift/installer/pull/7425

Bug OCPBUGS-22377: admission web hook probe error when deploy sample KSVC based app and then modifying icon

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-6513~~. The following is the description of the original issue:
—
Description of problem:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default suggested, ie image openshift/hello-openshift.

Then tried to change the displayed icon using the web UI and an error about Probes was displayed. See attached images.

The error has no relevance to the item changed.

Version-Release number of selected component (if applicable):

whatever the RH sandbox uses, this value is not displayed to users

How reproducible:

very

Steps to Reproduce:

Using the web console on the RH Developer Sandbox, created the most basic Knative Service (KSVC) using the default image openshift/hello-openshift.

Then used the webUi to edit the KSVC sample to change the icon used from an OpenShift logo to a 3Scale logo for instance.

When saving from this form an error was reported: admission webhook 'validation webhook.serving.knative.dev' denied the request: validation failed: must not set the field(s): spec.template.spec.containers[0].readiness.Probe

Actual results:

Expected results:

Either a failure message related to changing the icon, or the icon change to take effect

Additional info:

KSVC details as provided by the web console.

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: sample
  namespace: agroom-dev
spec:
  template:
    spec:
      containers:
        - image: openshift/hello-openshift

https://github.com/openshift/console/pull/13279

Bug OCPBUGS-22389: docker.io rate limiting triggering issues with okd jobs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22276~~. The following is the description of the original issue:
—
Description of problem:

8.1478  tagged from docker.io/openshift/wildfly-81-centos7:latest479    prefer registry pullthrough when referencing this tag480481  Build and run WildFly 8.1 applications on CentOS 7. For more information about using this builder image, including OpenShift considerations, see https://github.com/openshift-s2i/s2i-wildfly/blob/master/README.md.482  Tags: builder, wildfly, java483  Supports: wildfly:8.1, jee, java484  Example Repo: https://github.com/openshift/openshift-jee-sample.git485486  ! error: Import failed (Unauthorized): you may not have access to the container image "docker.io/openshift/wildfly-81-centos7:latest"487      20 minutes ago488489490error: imported completed with errors491[Mon Oct 23 15:23:32 UTC 2023] Retrying image import openshift/wildfly:10.1492error: tag latest failed: you may not have access to the container image "docker.io/openshift/wildfly-101-centos7:latest"493imagestream.image.openshift.io/wildfly imported with errors494495Name:			wildfly496Namespace:		openshift497Created:		21 minutes ago

Version-Release number of selected component (if applicable):

4.14 / 4.15

How reproducible:

Often on vSphere jobs, perhaps because they lack a local mirror?

Steps to Reproduce:

1.
2.
3.

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/44127/rehearse-44127-periodic-ci-openshift-release-master-okd-scos-4.14-e2e-aws-ovn-serial/1716463869561409536

Expected results:

ci jobs run successfully

Additional info:

https://github.com/openshift/origin/pull/28355

Bug OCPBUGS-22538: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-2324: [AWS] IPI installation is not supported in C2S regions

View the Description View the linked PRs

Description of problem:

IPI installation failed in AWS, CreateVpcEndpoint not supported in C2S region

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

IPI installation in AWS

1. terraform apply
2. When using a aws_vpc_endpoint resource with aws terraform provider >= 2.53.0 in the C2S regions (us-iso*), an error is thrown stating UnsupportedOperation. 
3.

Actual results:

Unable to install OCP 4.X in AWS C2S(top-secret) region

Expected results:

IPI installation in AWS C2S region

Additional info:

Upstream bug:

[Bug]: C2S CreateVpcEndpoint UnsupportedOperation: The operation is not supported in this region! · Issue #27048 · hashicorp/terraform-provider-aws · GitHub
https://github.com/hashicorp/terraform-provider-aws/issues/27048

https://github.com/openshift/installer/pull/7274

Bug OCPBUGS-14602: selected project was not taking effect when searching InstallPlans

View the Description View the linked PRs

Description of problem:

when searching InstallPlans with specific project selected, still all IPs are listed, the selected project is not applied in filter

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-05-112833

How reproducible:

Always

Steps to Reproduce:

1. Install some operators to specific namespace and all namespaces
$ oc get ip -A
NAMESPACE             NAME            CSV                                 APPROVAL    APPROVED
default               install-tftg4   etcdoperator.v0.9.4                 Automatic   true
openshift-operators   install-5g2l4   3scale-community-operator.v0.10.1   Automatic   true
$ oc get sub -A
NAMESPACE             NAME                        PACKAGE                     SOURCE                CHANNEL
default               etcd                        etcd                        community-operators   singlenamespace-alpha
openshift-operators   3scale-community-operator   3scale-community-operator   community-operators   threescale-2.13  
2. navigates to Home -> Search page, select project 'default' in project dropdown, choose 'InstallPlan' resource
3. check the filtered lists

Actual results:

3. InstallPlans in all namespaces are listed

Expected results:

3. only the InstallPlan in 'default' project should be listed

Additional info:

https://github.com/openshift/console/pull/12880

Bug OCPBUGS-16504: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-bootstrap/pull/99

Bug OCPBUGS-16688: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1076

Bug OCPBUGS-12566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/107

Bug OCPBUGS-12775: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.13. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/500

Bug OCPBUGS-13309: ci: break due to ingress-operator feature gate change

View the Description View the linked PRs

ingress-operator will not start

https://github.com/openshift/cluster-ingress-operator/pull/908

https://github.com/openshift/hypershift/pull/2543

Bug OCPBUGS-16150: Start last pipeline run results to error message in topology side bar

View the Description View the linked PRs

Description of problem:

In topology side panel, in pipelineruns section, on click of "Start last run" button, error alert message is displayed

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a deployment with pipeline
2. Click on deployment to open side panel
3. Click "Start last run" button in PipelineRuns section

Actual results:

Error alert message is displayed

Expected results:

Should be able to run the last run

Additional info:

https://github.com/openshift/console/pull/13009

Bug OCPBUGS-17196: NetworkAttachedDefenition bug - clone

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13053

Bug OCPBUGS-17964: hypershift ovn-k control-plane vs worker config missmatch

View the Description View the linked PRs

Description of problem:

Hypershift kubevirt provider hosted cluster cannot start up after activating ovn-k interconnect at hosted cluster.

The issue is that ovn-k configurations missmatch:

The cluster manager config in the hosted cluster namespace:

  ovnkube.conf: |-
    [default]
    mtu="8801"
    cluster-subnets="10.132.0.0/14/23"
    encap-port="9880"
    enable-lflow-cache=true
    lflow-cache-limit-kb=1048576

    [kubernetes]
    service-cidrs="172.31.0.0/16"
    ovn-config-namespace="openshift-ovn-kubernetes"
    cacert="/hosted-ca/ca.crt"
    apiserver="https://kube-apiserver:6443"
    host-network-namespace="openshift-host-network"
    platform-type="KubeVirt"
    dns-service-namespace="openshift-dns"
    dns-service-name="dns-default"

    [ovnkubernetesfeature]
    enable-egress-ip=true
    enable-egress-firewall=true
    enable-egress-qos=true
    enable-egress-service=true
    egressip-node-healthcheck-port=9107

    [gateway]
    mode=shared
    nodeport=true
    v4-join-subnet="100.65.0.0/16"

    [masterha]
    election-lease-duration=137
    election-renew-deadline=107
    election-retry-period=26

The controller config in the hosted cluster
  ovnkube.conf: |-
    [default]
    mtu="8801"
    cluster-subnets="10.132.0.0/14/23"
    encap-port="9880"
    enable-lflow-cache=true
    lflow-cache-limit-kb=1048576
    enable-udp-aggregation=true

    [kubernetes]
    service-cidrs="172.31.0.0/16"
    ovn-config-namespace="openshift-ovn-kubernetes"
    apiserver="https://a392ee248c42a4ffca67f2909823466e-18e866c0f5fb5880.elb.us-west-2.amazonaws.com:6443"
    host-network-namespace="openshift-host-network"
    platform-type="KubeVirt"
    healthz-bind-address="0.0.0.0:10256"
    dns-service-namespace="openshift-dns"
    dns-service-name="dns-default"

    [ovnkubernetesfeature]
    enable-egress-ip=true
    enable-egress-firewall=true
    enable-egress-qos=true
    enable-egress-service=true
    egressip-node-healthcheck-port=9107
    enable-multi-network=true

    [gateway]
    mode=shared
    nodeport=true

    [masterha]
    election-lease-duration=137
    election-renew-deadline=107
    election-retry-period=26

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Deploy latest 4.14 ocp clustrer
2. Install latest hypershift operator
3. Deploy hosted cluster with latest 4.14 ocp release image

Actual results:

Hosted cluster get stuck at 

network                                    4.14.0-0.ci-2023-08-20-221659   True        True          False      3h53m   DaemonSet "/openshift-multus/network-metrics-daemon" is waiting for other operators to become ready...

Expected results:

All the hosted clusters operators should be ok

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1962

Task MGMT-14449: [testing] Create hosts in the past to reduce race conditions

View the Description View the linked PRs

Some unit tests are flaky because we check timestamps to have changed.

When creation and test happen very quickly, this might seem to not have changed.

https://redhat-internal.slack.com/archives/C014N2VLTQE/p1681827276489839

We can fix this by simulating host creation to have happened in the past

https://github.com/openshift/assisted-service/pull/5160

Bug OCPBUGS-20358: New Feature in 4.14 - Node Dashboard in OCP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20338~~. The following is the description of the original issue:
—
Description of problem:

In 4.14 RHOCP version, New feature that is Node dashboard is not showing expected metric/dashboard data.

[hjaiswal@hjaiswal 4_14]$ oc get nodes
NAME                                             STATUS     ROLES                  AGE     VERSION
ip-10-0-26-232.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-42-100.ap-southeast-1.compute.internal   Ready      control-plane,master   6h12m   v1.27.6+1648878
ip-10-0-46-197.ap-southeast-1.compute.internal   Ready      worker                 6h3m    v1.27.6+1648878
ip-10-0-66-225.ap-southeast-1.compute.internal   NotReady   worker                 6h3m    v1.27.6+1648878
ip-10-0-8-20.ap-southeast-1.compute.internal     Ready      worker                 6h5m    v1.27.6+1648878
ip-10-0-80-84.ap-southeast-1.compute.internal    Ready      control-plane,master   6h12m   v1.27.6+1648878

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1. Check whether all the nodes are in ready state. (cluster version 4.14)
2. ssh/debug to any worker node.
3. Stop the kubelet service. 
4. check whether node went into notready state.
5. Open openshift console and goto observe--> dashboard ---> then select new feature that is "Node cluster".
6. Its showing "0" nodes in notready state but it should display "1" node in notready state.

Actual results:

In Node cluster there is no count for not ready node.

Expected results:

In Node cluster the notready node should be 1

Additional info:

Tested in AWS IPI cluster

https://github.com/openshift/machine-config-operator/pull/3966

Bug OCPBUGS-12678: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/telemeter/pull/462

Bug OCPBUGS-15365: Cloud Credential Operator Consumes Too Much Memory

View the Description View the linked PRs

Description of problem:

CCO watches too many things.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Run CCO in a cluster with a large amount of data in ConfigMaps or Secrets or Namespaces.
2. Watch memory usage scale linearly with the size of both.
3.

Actual results:

Memory usage scales linearly with the size of all ConfigMaps, Secrets and Namespaces on the cluster.

Expected results:

Memory usage scales linearly with the data CCO actually needs to function.

Additional info:

Bug OCPBUGS-17496: Bridge NAD should set "preserveDefaultVlan": false - Clone

View the Description View the linked PRs

Description of problem:

This ticket was created to track: https://issues.redhat.com/browse/CNV-31770

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13076

Bug OCPBUGS-11215: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27834

Bug OCPBUGS-12345: Update 4.14 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/460

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/464

Bug OCPBUGS-14301: KCM crashes when Topology cache's HasPopulatedHints method attempts concurrent map access

View the Description View the linked PRs

Description of problem:

KCM crashes when Topology cache's HasPopulatedHints method attempts concurrent map access

Miciah has started working on the upstream fix and we need to bring in the changes into openshift/kubernetes as soon as we can

https://redhat-internal.slack.com/archives/C01CQA76KMX/p1684876782205129 for more context

Version-Release number of selected component (if applicable):

How reproducible:

CI 4.14 upgrade jobs run into this problem quite often: https://search.ci.openshift.org/?search=pkg%2Fcontroller%2Fendpointslice%2Ftopologycache%2Ftopologycache.go&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

Actual results:

KCM crashing

Expected results:

KCM not crashing

Additional info:

Bug OCPBUGS-16776: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-15999~~.

https://github.com/openshift/installer/pull/7409

Bug OCPBUGS-20566: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/254

Story HOSTEDCP-980: Signal in a metric if KCM or other critical CP component is crashlooping

View the Description View the linked PRs

We should include HostedClusterDegraded in hypershift_hostedclusters_failure_conditions metric so it's obvious when there's an issue across the fleet.

lastTransitionTime: "2023-05-04T13:53:50Z" message: kube-controller-manager deployment has 1 unavailable replicas observedGeneration: 1 reason: UnavailableReplicas status: "True" type: Degraded

https://github.com/openshift/hypershift/pull/2523

Bug OCPBUGS-10831: It is better for pod-security admission config to use v1 like upstream instead of still using v1beta1

View the Description View the linked PRs

Description of problem:
It is better for pod-security admission config to use v1 like upstream instead of still using v1beta1

Version-Release number of selected component (if applicable):
4.12, 4.13

How reproducible:
Always

Steps to Reproduce:
1. In upstream, when it was 1.24, https://v1-24.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller shows "pod-security.admission.config.k8s.io/v1beta1".

When it was 1.25 (OCP 4.12), https://v1-25.docs.kubernetes.io/docs/tasks/configure-pod-container/enforce-standards-admission-controller/#configure-the-admission-controller does not show "shows pod-security.admission.config.k8s.io/v1beta1" any longer. In the bottom, it notes: pod-security.admission.config.k8s.io/v1 configuration requires v1.25+. For v1.23 and v1.24, use v1beta1.

In OCP 4.12 (1.25) and 4.13 (1.26), it is still v1beta1, we'd better to align with upstream:

4.12:
$ oc version
..
Server Version: 4.12.9
Kubernetes Version: v1.25.7+eab9cc9

$ jq "" $(oc extract cm/config -n openshift-kube-apiserver --confirm) | jq '.admission.pluginConfig.PodSecurity'
{
  "configuration": {
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "audit": "restricted",
      "audit-version": "latest",
      "enforce": "privileged",
      "enforce-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    },
    "kind": "PodSecurityConfiguration"
  }
}

4.13:
$ oc version
...
Server Version: 4.13.0-0.nightly-2023-03-23-204038
Kubernetes Version: v1.26.2+dc93b13

$ jq "" $(oc extract cm/config -n openshift-kube-apiserver --confirm) | jq '.admission.pluginConfig.PodSecurity'
{
  "configuration": {
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "audit": "restricted",
      "audit-version": "latest",
      "enforce": "privileged",
      "enforce-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    },
    "kind": "PodSecurityConfiguration"
  }
}

Actual results:

See above.

Expected results:

It is better for pod-security admission config to align with upstream to use v1 than v1beta1.

Additional info:

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1481

Bug OCPBUGS-13112: Control plane operator hangs during reconcile for a few minutes after/during infrastructure reconciliation

View the Description View the linked PRs

Description of problem:

CPO reconciliation loop hangs after "Reconciling infrastructure status"

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Frequently

Steps to Reproduce:

1.Create a HostedCluster with a recent 4.14 release image
2.Watch CPO logs
3.

Actual results:

Reconcile gets stuck

Expected results:

Reconcile happens fairly quickly

Additional info:

https://github.com/openshift/hypershift/pull/2522

Bug OCPBUGS-13148: cgroupv1 support for cpu balancing is broken for non-SNO nodes

View the Description View the linked PRs

Description of problem:

Deployment of a standard masters+workers cluster using 4.13.0-rc.6 does not configure the cgroup structure according to OCPNODE-1539

Version-Release number of selected component (if applicable):

OCP 4.13.0-rc.6

How reproducible:

Always

Steps to Reproduce:

1. Deploy the cluster
2. Check for presence of /sys/fs/cgroup/cpuset/system*
3. Check the status of cpu balancing of the root cpuset cgroup (should be disabled)

Actual results:

No system cpuset exists and all services are still present in the root cgroup with cpu balancing enabled.

Expected results:

Additional info:

The code has a bug we missed. It is nested under the Workload partitioning check on line https://github.com/haircommander/cluster-node-tuning-operator/blob/123e26df30c66fd5c9836726bd3e4791dfd82309/pkg/performanceprofile/controller/performanceprofile/components/machineconfig/machineconfig.go#L251

Bug OCPBUGS-13392: NetworkPolicyLegacy test failing on kube 1.27 bump

View the Description View the linked PRs

Description of problem:

NetworkPolicyLegacy test timeout on bump PR, the latest is https://github.com/openshift/origin/pull/27912
Job example https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27912/pull-ci-openshift-origin-master-e2e-gcp-ovn/1655997089001246720

Seems like the problem is 15 min timeout, test fails with " Interrupted by User". I think this is change that affected it https://github.com/kubernetes/kubernetes/pull/112923.

From what I saw in the logs, seems like "testCannotConnect" reaches 5 min timeout instead of completing in ~45 sec based on the client pod command. But this is NetworkPolicyLegacy, not sure how much time we want to spend debugging it.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Slack thread https://redhat-internal.slack.com/archives/C04UQLWQAP3/p1683640905643069

https://github.com/openshift/kubernetes/pull/1623

Bug OCPBUGS-23986: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/631

Bug OCPBUGS-13021: kube-apiserver isn't healthy after a cluster comes up

View the Description View the linked PRs

Description of problem:

APIServer endpoint isn't healthy after a PublicAndPrivate cluster is created. PROGRESS  of the cluster is Completed and PROCESS is false, Nodes are ready, cluster operators on the guest cluster are Available, only issue is condition Type Available is False due to APIServer endpoint is not healthy.

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
NAME   VERSION               KUBECONFIG         PROGRESS  AVAILABLE  PROGRESSING  MESSAGE
jz-test  4.14.0-0.nightly-2023-04-30-235516  jz-test-admin-kubeconfig  Completed  False    False     APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com is not healthy

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster/jz-test -n clusters -ojsonpath='{.spec.platform.aws.endpointAccess}{"\n"}'
PublicAndPrivate

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jz-test
NAME                                                  READY   STATUS    RESTARTS   AGE
aws-cloud-controller-manager-666559d4f-rdsw4          2/2     Running   0          149m
aws-ebs-csi-driver-controller-79fdfb6c76-vb7wr        7/7     Running   0          148m
aws-ebs-csi-driver-operator-7dbd789984-mb9rp          1/1     Running   0          148m
capi-provider-5b7847db9-nlrvz                         2/2     Running   0          151m
catalog-operator-7ccb468d86-7c5j6                     2/2     Running   0          149m
certified-operators-catalog-895787778-5rjb6           1/1     Running   0          149m
cloud-network-config-controller-86698fd7dd-kgzhv      3/3     Running   0          148m
cluster-api-6fd4f86878-hjw59                          1/1     Running   0          151m
cluster-autoscaler-bdd688949-f9xmk                    1/1     Running   0          150m
cluster-image-registry-operator-6f5cb67d88-8svd6      3/3     Running   0          149m
cluster-network-operator-7bc69f75f4-npjfs             1/1     Running   0          149m
cluster-node-tuning-operator-5855b6576b-rckhh         1/1     Running   0          149m
cluster-policy-controller-56d4d6b57c-glx4w            1/1     Running   0          149m
cluster-storage-operator-7cc56c68bb-jd4d2             1/1     Running   0          149m
cluster-version-operator-bd969b677-bh4w4              1/1     Running   0          149m
community-operators-catalog-5c545484d7-hbzb4          1/1     Running   0          149m
control-plane-operator-fc49dcbb4-5ncvf                2/2     Running   0          151m
csi-snapshot-controller-85f7cc9945-n5vgq              1/1     Running   0          149m
csi-snapshot-controller-operator-6597b45897-hqf5p     1/1     Running   0          149m
csi-snapshot-webhook-644d765546-lk9hj                 1/1     Running   0          149m
dns-operator-5b5577d6c7-8dh8d                         1/1     Running   0          149m
etcd-0                                                2/2     Running   0          150m
hosted-cluster-config-operator-5b75ccf55d-6rzch       1/1     Running   0          149m
ignition-server-596fc9d9fb-sb94h                      1/1     Running   0          150m
ingress-operator-6497d476bc-whssz                     3/3     Running   0          149m
konnectivity-agent-6656d8dfd6-h5tcs                   1/1     Running   0          150m
konnectivity-server-5ff9d4b47-stb2m                   1/1     Running   0          150m
kube-apiserver-596fc4bb8b-7kfd8                       3/3     Running   0          150m
kube-controller-manager-6f86bb7fbd-4wtxk              1/1     Running   0          138m
kube-scheduler-bf5876b4b-flk96                        1/1     Running   0          149m
machine-approver-574585d8dd-h5ffh                     1/1     Running   0          150m
multus-admission-controller-67b6f85fbf-bfg4x          2/2     Running   0          148m
oauth-openshift-6b6bfd55fb-8sdq7                      2/2     Running   0          148m
olm-operator-5d97fb977c-sbf6w                         2/2     Running   0          149m
openshift-apiserver-5bb9f99974-2lfp4                  3/3     Running   0          138m
openshift-controller-manager-65666bdf79-g8cf5         1/1     Running   0          149m
openshift-oauth-apiserver-56c8565bb6-6b5cv            2/2     Running   0          149m
openshift-route-controller-manager-775f844dfc-jj2ft   1/1     Running   0          149m
ovnkube-master-0                                      7/7     Running   0          148m
packageserver-6587d9674b-6jwpv                        2/2     Running   0          149m
redhat-marketplace-catalog-5f6d45b457-hdn77           1/1     Running   0          149m
redhat-operators-catalog-7958c4449b-l4hbx             1/1     Running   0          12m
router-5b7899cc97-chs6t                               1/1     Running   0          150m

jiezhao-mac:hypershift jiezhao$ oc get node --kubeconfig=hostedcluster.kubeconfig 
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-137-99.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
ip-10-0-140-85.us-east-2.compute.internal   Ready    worker   132m   v1.26.2+d2e245f
ip-10-0-141-46.us-east-2.compute.internal   Ready    worker   131m   v1.26.2+d2e245f
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get co --kubeconfig=hostedcluster.kubeconfig 
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
console                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      126m    
csi-snapshot-controller                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
dns                                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
image-registry                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      128m    
ingress                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
insights                                   4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
kube-apiserver                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-controller-manager                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-scheduler                             4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
kube-storage-version-migrator              4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
monitoring                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
network                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
node-tuning                                4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
openshift-apiserver                        4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
openshift-controller-manager               4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
openshift-samples                          4.14.0-0.nightly-2023-04-30-235516   True        False         False      129m    
operator-lifecycle-manager                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-04-30-235516   True        False         False      140m    
service-ca                                 4.14.0-0.nightly-2023-04-30-235516   True        False         False      130m    
storage                                    4.14.0-0.nightly-2023-04-30-235516   True        False         False      131m    
jiezhao-mac:hypershift jiezhao$ 

HC conditions:
==============
  status:
    conditions:
    - lastTransitionTime: "2023-05-01T19:45:49Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidAWSIdentityProvider
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: Cluster version is 4.14.0-0.nightly-2023-04-30-235516
      observedGeneration: 3
      reason: FromClusterVersion
      status: "False"
      type: ClusterVersionProgressing
    - lastTransitionTime: "2023-05-01T19:46:22Z"
      message: Payload loaded version="4.14.0-0.nightly-2023-04-30-235516" image="registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-04-30-235516"
        architecture="amd64"
      observedGeneration: 3
      reason: PayloadLoaded
      status: "True"
      type: ClusterVersionReleaseAccepted
    - lastTransitionTime: "2023-05-01T20:03:14Z"
      message: Condition not found in the CVO.
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ClusterVersionUpgradeable
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: Done applying 4.14.0-0.nightly-2023-04-30-235516
      observedGeneration: 3
      reason: FromClusterVersion
      status: "True"
      type: ClusterVersionAvailable
    - lastTransitionTime: "2023-05-01T20:00:18Z"
      message: ""
      observedGeneration: 3
      reason: FromClusterVersion
      status: "True"
      type: ClusterVersionSucceeding
    - lastTransitionTime: "2023-05-01T19:47:51Z"
      message: The hosted cluster is not degraded
      observedGeneration: 3
      reason: AsExpected
      status: "False"
      type: Degraded
    - lastTransitionTime: "2023-05-01T19:45:01Z"
      message: ""
      observedGeneration: 3
      reason: QuorumAvailable
      status: "True"
      type: EtcdAvailable
    - lastTransitionTime: "2023-05-01T19:45:38Z"
      message: Kube APIServer deployment is available
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: KubeAPIServerAvailable
    - lastTransitionTime: "2023-05-01T19:44:27Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: InfrastructureReady
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: External DNS is not configured
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ExternalDNSReachable
    - lastTransitionTime: "2023-05-01T19:44:19Z"
      message: Configuration passes validation
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidHostedControlPlaneConfiguration
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: AWS KMS is not configured
      observedGeneration: 3
      reason: StatusUnknown
      status: Unknown
      type: ValidAWSKMSConfig
    - lastTransitionTime: "2023-05-01T19:44:37Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidReleaseInfo
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: APIServer endpoint a23663b1e738a4d6783f6256da73fe76-2649b36a23f49ed7.elb.us-east-2.amazonaws.com
        is not healthy
      observedGeneration: 3
      reason: waitingForAvailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-05-01T19:47:18Z"
      message: All is well
      reason: AWSSuccess
      status: "True"
      type: AWSEndpointAvailable
    - lastTransitionTime: "2023-05-01T19:47:18Z"
      message: All is well
      reason: AWSSuccess
      status: "True"
      type: AWSEndpointServiceAvailable
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: Configuration passes validation
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidConfiguration
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: HostedCluster is supported by operator configuration
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: SupportedHostedCluster
    - lastTransitionTime: "2023-05-01T19:45:39Z"
      message: Ignition server deployment is available
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: IgnitionEndpointAvailable
    - lastTransitionTime: "2023-05-01T19:44:11Z"
      message: Reconciliation active on resource
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ReconciliationActive
    - lastTransitionTime: "2023-05-01T19:44:12Z"
      message: Release image is valid
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidReleaseImage
    - lastTransitionTime: "2023-05-01T19:44:12Z"
      message: HostedCluster is at expected version
      observedGeneration: 3
      reason: AsExpected
      status: "False"
      type: Progressing
    - lastTransitionTime: "2023-05-01T19:44:13Z"
      message: OIDC configuration is valid
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: ValidOIDCConfiguration
    - lastTransitionTime: "2023-05-01T19:44:13Z"
      message: Reconciliation completed succesfully
      observedGeneration: 3
      reason: ReconciliatonSucceeded
      status: "True"
      type: ReconciliationSucceeded
    - lastTransitionTime: "2023-05-01T19:45:52Z"
      message: All is well
      observedGeneration: 3
      reason: AsExpected
      status: "True"
      type: AWSDefaultSecurityGroupCreated

kube-apiserver log:
==================
E0501 19:45:07.024278       7 memcache.go:238] couldn't get current server API group list: Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_config-operator_01_proxy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_quota-openshift_01_clusterresourcequota.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_security-openshift_01_scc.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_03_securityinternal-openshift_02_rangeallocation.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_apiserver-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_authentication.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_build.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_console.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_dns.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_featuregate.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_image.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentpolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagecontentsourcepolicy.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagedigestmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_imagetagmirrorset.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_infrastructure-Default.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_ingress.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_network.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_node.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_oauth.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_project.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused
unable to recognize "/work/0000_10_config-operator_01_scheduler.crd.yaml": Get "https://localhost:6443/api?timeout=32s": dial tcp [::1]:6443: connect: connection refused

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a PublicAndPrivate cluster

Actual results:

APIServer endpoint is not healthy, and HC condition Type 'Available' is False

Expected results:

APIServer endpoint should be healthy, and Type 'Available' should be True

Additional info:

Bug OCPBUGS-19465: Cluster Version Operator does not correctly reconcile SCC resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18386~~. The following is the description of the original issue:
—
How reproducible:

Always

Steps to Reproduce:

1. the Kubernetes API introduces a new Pod Template parameter (`ephemeral`)
2. this parameter is not in the allowed list of the default SCC
3. customer is not allowed to edit the default SCCs nor we have a  mechanism in  place to update the built in SCCs AFAIK
4. users of existing clusters cannot use the new parameter without creating manual SCCs and assigning this SCC to service accounts themselves which looks clunky. This is documented in https://access.redhat.com/articles/6967808

Actual results:

Users of existing clusters cannot use ephemeral volumes after an upgrade

Expected results:

Users of existing clusters *can* use ephemeral volumes after an upgrade

Current status

https://github.com/openshift/cluster-version-operator/pull/972

Bug OCPBUGS-27104: [regression] increased etcd leader elections significantly impacting vsphere amd64 platform

View the Description View the linked PRs

This is a clone of issue OCPBUGS-27094. The following is the description of the original issue:
—
Description of problem:

Based on this and this component readiness data that compares success rates for those two particular tests, we are regressing ~7-10% between the current 4.15 master and 4.14.z (iow. we made the product ~10% worse).

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1720630313664647168

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1719915053026643968

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721475601161785344

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-serial/1724202075631390720

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-vsphere-ovn-upi-serial/1721927613917696000

These jobs and their failures are all caused by increased etcd leader elections disrupting seemingly unrelated test cases across the VSphere AMD64 platform.

Since this particular platform's business significance is high, I'm setting this as "Critical" severity.

Please get in touch with me or Dean West if more teams need to be pulled into investigation and mitigation.

Version-Release number of selected component (if applicable):

4.15 / master

How reproducible:

Component Readiness Board

Actual results:

The etcd leader elections are elevated. Some jobs indicate it is due to disk i/o throughput OR network overload.

Expected results:

1. We NEED to understand what is causing this problem.
2. If we can mitigate this, we should.
3. If we cannot mitigate this, we need to document this or work with VSphere infrastructure provider to fix this problem.
4. We optionally need a way to measure how often this happens in our fleet so we can evaluate how bad it is.

Additional info:

https://github.com/openshift/openshift-apiserver/pull/412

Bug HOSTEDCP-1009: external-dns image should be overridable on hypershift install

View the Description View the linked PRs

Currently the external-dns image is hardcoded
https://github.com/openshift/hypershift/blob/3b73a1a243122b9cb78ebc9848b7af158142d2d2/cmd/install/install.go#L513

hypershift install should have some method of overriding this

https://github.com/openshift/hypershift/pull/2623

Bug OCPBUGS-12165: Wrong cleanup of stale conditions from OCPBUGS-2783

View the Description View the linked PRs

Description of problem:

While updating a cluster to 4.12.11, which contains the bug fix for [OCPBUGS-7999|https://issues.redhat.com/browse/OCPBUGS-7999] (which is the 4.12.z backport of [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783], it seems that the older {{{Custom|Default}RouteSync{Degraded|Progressing}}} conditions are not cleaned up as they should, as per [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] resolution, while the newer ones are added.

Due to this, on an upgrade to 4.12.11 (or higher, until this bug is fixed), it is possible to hit a problem very similar to the one that lead to [OCPBUGS-2783|https://issues.redhat.com/browse/OCPBUGS-2783] in the first place, but while upgrading to 4.12.11.

So, we need to do a proper cleanup of the older conditions.

Version-Release number of selected component (if applicable):

4.12.11 and higher

How reproducible:

Always in what regards the wrong conditions. It only leads to issues if one of the wrong conditions was in unhealthy state.

Steps to Reproduce:

1. Upgrade
2.
3.

Actual results:

Both new (and correct) conditions plus older (and wrong) conditions.

Expected results:

Both new (and correct) conditions only.

Additional info:

Problem seems to be that the stale conditions controller is created[1] with a list that says {{CustomRouteSync}} and {{DefaultRouteSync}}, while that list should be {{CustomRouteSyncDegraded}}, {{CustomRouteSyncProgressing}}, {{DefaultRouteSyncDegraded}} and {{DefaultRouteSyncProgressing}}. I read the source code of the controller a bit and it seems that it does not admit prefixes but performs a literal comparison.

[1] - https://github.com/openshift/console-operator/blob/0b54727/pkg/console/starter/starter.go#L403-L404

https://github.com/openshift/console-operator/pull/757

Bug OCPBUGS-12564: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/245

Bug OCPBUGS-11792: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-11788~~.

https://github.com/openshift/installer/pull/7135

Bug OCPBUGS-21067: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/coredns/pull/100

Bug OCPBUGS-22286: CNO pod restart in hypershift CI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18569~~. The following is the description of the original issue:
—
We are seeing flakes on CNO pod restarts flake in hypershift CI on the hypershift control plane

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_hypershift/2967/pull-ci-openshift-hypershift-main-e2e-kubevirt-aws-ovn/1699008879737704448/artifacts/e2e-kubevirt-aws-ovn/run-e2e-local/artifacts/TestCreateCluster/namespaces/e2e-clusters-pvhd5-example-s6skm/core/pods/logs/cluster-network-operator-78fd774c97-7w7dg-cluster-network-operator-previous.log

W0905 11:42:53.359515       1 builder.go:106] graceful termination failed, controllers failed with error: failed to get infrastructure name: infrastructureName not set in infrastructure 'cluster'

The current backoff is set to retry.DefaultBackoff which is appropriate for 409 conflicts and only retries for < 1s

var DefaultBackoff = wait.Backoff{
	Steps:    4,
	Duration: 10 * time.Millisecond,
	Factor:   5.0,
	Jitter:   0.1,
}

Elsewhere in the codebase, retry.DefaultBackoff is used with retry.RetryOnConflict() where it is appropriate, but we need to retry for much longer here and much less frequently.

https://github.com/openshift/cluster-network-operator/pull/2078

Bug OCPBUGS-23508: [release-4.14] OLM's e2e suite uses CVO managed namespaces

View the Description View the linked PRs

Description of problem:

Some of the tests ran in OLM's e2e suite are ran in CVO managed namespaces and rely on CVO managed resources, creating a possible race condition where CVO will remove changes to the resources' spec

Version-Release number of selected component (if applicable):

4.15.0, 4.14.0, possibly more

How reproducible:

Occasionally

Steps to Reproduce:

1. Run OLM's e2e suite

Actual results:

This test will occasionally fail: End-to-end: [It] Operator Group CSV copy watching all namespaces

Expected results:

The test should not fail because of the CVO.

Additional info:

https://github.com/openshift/operator-framework-olm/pull/614

Bug OCPBUGS-4877: MCO warns unknown fields from ControllerConfig

View the Description View the linked PRs

Description of problem:

Upgraded from 4.11.17 -> 4.12.0 rc3 and found (after successful upgrade) this repeating in Machine Config Operator logs:

2022-12-13T23:11:51.511167249Z W1213 23:11:51.511120       1 warnings.go:70] unknown field "spec.dns.metadata.creationTimestamp"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511140       1 warnings.go:70] unknown field "spec.dns.metadata.generation"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511143       1 warnings.go:70] unknown field "spec.dns.metadata.managedFields"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511146       1 warnings.go:70] unknown field "spec.dns.metadata.name"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511148       1 warnings.go:70] unknown field "spec.dns.metadata.resourceVersion"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511151       1 warnings.go:70] unknown field "spec.dns.metadata.uid"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511153       1 warnings.go:70] unknown field "spec.infra.metadata.creationTimestamp"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511155       1 warnings.go:70] unknown field "spec.infra.metadata.generation"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511157       1 warnings.go:70] unknown field "spec.infra.metadata.managedFields"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511159       1 warnings.go:70] unknown field "spec.infra.metadata.name"
2022-12-13T23:11:51.511167249Z W1213 23:11:51.511161       1 warnings.go:70] unknown field "spec.infra.metadata.resourceVersion"
2022-12-13T23:11:51.511211644Z W1213 23:11:51.511163       1 warnings.go:70] unknown field "spec.infra.metadata.uid"

Version-Release number of selected component (if applicable):

4.12.0-rc3
Platform agnostic installation

How reproducible:

Just once (working with user outside RH)

Steps to Reproduce:

1. Install 4.11.17
2. Set candidate-4.12 upgrade channel
3. Initiate upgrade (apply admin ack as needed)
4. After upgrade, check Machine Config Operator logs

Actual results:

The upgrade went fine and I don't see any symptoms outside of warnings repeating in MCO log

Expected results:

I don't expect the warnings to be logged repeatedly

Additional info:

https://github.com/openshift/machine-config-operator/pull/3662

Bug OCPBUGS-10765: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16921: Fail to apply machine-config during rhel node upgrade

View the Description View the linked PRs

Description of problem:

ci job "amd64-nightly-4.13-upgrade-from-stable-4.12-vsphere-ipi-proxy-workers-rhel8" failed at rhel node upgrade stage with following error:

TASK [openshift_node : Apply machine config] ***********************************3583task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/apply_machine_config.yml:683584Using module file /opt/python-env/ansible-core/lib64/python3.8/site-packages/ansible/modules/command.py3585Pipelining is enabled.3586<192.168.233.236> ESTABLISH SSH CONNECTION FOR USER: test3587<192.168.233.236> SSH: EXEC ssh -o ControlMaster=auto -o ControlPersist=600s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="test"' -o ConnectTimeout=30 -o IdentityFile=/var/run/secrets/ci.openshift.io/cluster-profile/ssh-privatekey -o StrictHostKeyChecking=no -o 'ControlPath="/alabama/.ansible/cp/%h-%r"' 192.168.233.236 '/bin/sh -c '"'"'sudo -H -S -n  -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-vwugynewkogzaosazvikpnplnmjoluxs ; http_proxy=http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX@192.168.221.228:3128 https_proxy=http://XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX@192.168.221.228:3128 no_proxy=.cluster.local,.svc,10.128.0.0/14,127.0.0.1,172.30.0.0/16,192.168.233.0/25,api-int.ci-op-ssnlf4qb-1dacf.vmc-ci.devcluster.openshift.com,localhost /usr/libexec/platform-python'"'"'"'"'"'"'"'"' && sleep 0'"'"''3588Escalation succeeded3589<192.168.233.236> (1, b'\n{"changed": XXXX, "stdout": "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)\\r\\nI0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8\\r\\nF0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy", "stderr": "time=\\"2023-07-26T19:36:55-04:00\\" level=warning msg=\\"The input device is not a TTY. The --tty and --interactive flags might not work properly\\"", "rc": 255, "cmd": ["podman", "run", "-v", "/:/rootfs", "--pid=host", "--privileged", "--rm", "--entrypoint=/usr/bin/machine-config-daemon", "-ti", "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032", "start", "--node-name", "ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1", "--once-from", "/tmp/ansible.mlldlsm5/worker_ignition_config.json", "--skip-reboot"], "start": "2023-07-26 19:36:55.852527", "end": "2023-07-26 19:36:56.827081", "delta": "0:00:00.974554", "failed": XXXX, "msg": "non-zero return code", "invocation": {"module_args": {"_raw_params": "podman run -v /:/rootfs --pid=host --privileged --rm --entrypoint=/usr/bin/machine-config-daemon -ti quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032 start --node-name ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1 --once-from /tmp/ansible.mlldlsm5/worker_ignition_config.json --skip-reboot", "_uses_shell": false, "warn": false, "stdin_add_newline": XXXX, "strip_empty_ends": XXXX, "argv": null, "chdir": null, "executable": null, "creates": null, "removes": null, "stdin": null}}}\n', b'')3590<192.168.233.236> Failed to connect to the host via ssh: 3591fatal: [192.168.233.236]: FAILED! => {3592    "changed": XXXX,3593    "cmd": [3594        "podman",3595        "run",3596        "-v",3597        "/:/rootfs",3598        "--pid=host",3599        "--privileged",3600        "--rm",3601        "--entrypoint=/usr/bin/machine-config-daemon",3602        "-ti",3603        "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032",3604        "start",3605        "--node-name",3606        "ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1",3607        "--once-from",3608        "/tmp/ansible.mlldlsm5/worker_ignition_config.json",3609        "--skip-reboot"3610    ],3611    "delta": "0:00:00.974554",3612    "end": "2023-07-26 19:36:56.827081",3613    "invocation": {3614        "module_args": {3615            "_raw_params": "podman run -v /:/rootfs --pid=host --privileged --rm --entrypoint=/usr/bin/machine-config-daemon -ti quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0110276ce82958a105cdd59028043bcdb1e5c33a77e550a13a1dc51aee08b032 start --node-name ci-op-ssnlf4qb-1dacf-bbmqt-rhel-1 --once-from /tmp/ansible.mlldlsm5/worker_ignition_config.json --skip-reboot",3616            "_uses_shell": false,3617            "argv": null,3618            "chdir": null,3619            "creates": null,3620            "executable": null,3621            "removes": null,3622            "stdin": null,3623            "stdin_add_newline": XXXX,3624            "strip_empty_ends": XXXX,3625            "warn": false3626        }3627    },3628    "msg": "non-zero return code",3629    "rc": 255,3630    "start": "2023-07-26 19:36:55.852527",3631    "stderr": "time=\"2023-07-26T19:36:55-04:00\" level=warning msg=\"The input device is not a TTY. The --tty and --interactive flags might not work properly\"",3632    "stderr_lines": [3633        "time=\"2023-07-26T19:36:55-04:00\" level=warning msg=\"The input device is not a TTY. The --tty and --interactive flags might not work properly\""3634    ],3635    "stdout": "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)\r\nI0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8\r\nF0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy",3636    "stdout_lines": [3637        "I0726 23:36:56.436283   27240 start.go:61] Version: v4.13.0-202307242035.p0.g7b54f1d.assembly.stream-dirty (7b54f1dcce4ea9f69f300d0e1cf2316def45bf72)",3638        "I0726 23:36:56.437075   27240 daemon.go:478] not chrooting for source=rhel-8 target=rhel-8",3639        "F0726 23:36:56.437240   27240 start.go:75] failed to re-exec: writing /rootfs/run/bin/machine-config-daemon: open /rootfs/run/bin/machine-config-daemon: text file busy"3640    ]3641}3642

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-07-26-101700

How reproducible:

always

Steps to Reproduce:

Found in ci:
1. Install a v4.13.6 cluster with rhel8 node
2. Upgrade ocp succeed
3. Upgrade rhel node

Actual results:

rhel node upgrade failed

Expected results:

rhel node upgrade succeed

Additional info:

job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-amd64-nightly-4.13-upgrade-from-stable-4.12-vsphere-ipi-proxy-workers-rhel8-p2-f28/1684288836412116992

https://github.com/openshift/machine-config-operator/pull/3825

Bug OCPBUGS-18105: [IBM VPC] failed provisioning volume in proxy cluster

View the Description View the linked PRs

Description of problem:

IBM VPC CSI Driver failed to provisioning volume in proxy cluster, (if I understand correctly) it seems the proxy in not injected because in our definition (https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml), we are injecting proxy to csi-driver:
    config.openshift.io/inject-proxy: csi-driver
    config.openshift.io/inject-proxy-cabundle: csi-driver
but the container name is iks-vpc-block-driver in https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml#L153

I checked the proxy in not defined in controller pod or driver container ENV.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create IBM cluster with proxy setting
2. create pvc/pod with IBM VPC CSI Driver

Actual results:

It failed to provisioning volume

Expected results:

Provisioning volume works well on proxy cluster

Additional info:

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/74

Bug OCPBUGS-23571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1162

Bug OCPBUGS-11464: Availability requirement update is initially disabled on Edit PodDisruptionBudget page

View the Description View the linked PRs

Description of problem:

Availability requirement updates is disabled on Edit PDB page, also when user tries to edit, it clears the current value so that user has no idea what's the current settings

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-03-211601

How reproducible:

Always

Steps to Reproduce:

1. Goes to deployment page -> Actions -> Add PodDisruptionBudget
2. on 'Create PodDisruptionBudge' page, set following fields and hit 'Create'
Name: example-pdb
Availability requirement:  maxUnavailable: 2
3. Make sure pdb/example-pdb is successfully created
$ oc get pdb
NAME          MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
example-pdb   N/A             2                 2                     99s
4. Goes to deployment page again,  Actions -> Edit PodDisruptionBudget

Actual results:

'Availability requirement' value is disabled from editing by default, when user click 'maxUnavailable', the value is set to empty(user has no idea what's the original value)

Expected results:

when editing PDB, we should load the form with current value and user should have permission to update the values by default

Additional info:

https://github.com/openshift/console/pull/12918

Bug OCPBUGS-14833: CNO breaks with newer golangci-linter

View the Description View the linked PRs

No QA needed. Current CNO does not pass with newer linter version 1.53.1.

https://github.com/openshift/cluster-network-operator/pull/1834

Bug OCPBUGS-16088: Secret generated by CCO on STS Manual Mode cluster does not have default section

View the Description View the linked PRs

Description of problem:

Secrets generated by CCO in STS mode is different than the one created by ccoctl on cmdline.

ccoctl generates:

[default]
sts_regional_endpoints = regional
role_arn = arn:aws:iam::269733383066:role/jsafrane-1-5h8rm-openshift-cluster-csi-drivers-aws-efs-cloud-cre
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

CCO generates:

sts_regional_endpoints = regional
role_arn = arn:aws:iam::269733383066:role/jsafrane-1-5h8rm-openshift-cluster-csi-drivers-aws-efs-cloud-cre
web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token

IMO these two should be the same. AWS EFS CSI driver does not work without "[default]" at the beginning.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1. Create a Manual mode, STS cluster in AWS.
2. Create a CredentialsRequest which provides .spec.cloudTokenPath and .spec.providerSpec.stsIAMRoleARN.
3. Observe that secret is created by CCO in the target namespace specified by the CredentialsRequest.

Actual results:

The secrets does not have [default] in the `data` content.

Expected results:

https://github.com/openshift/cloud-credential-operator/pull/565

Bug OCPBUGS-4370: Make sure k8s.ovn.org/node-primary-ifaddr annotation is correct

View the Description View the linked PRs

When we set the k8s.ovn.org/node-primary-ifaddr annotation on the node, we simply take the first valid IP address we find on the node gateway. We exclude link-local addresses and those in internally reserved subnets (https://github.com/openshift/ovn-kubernetes/pull/1386).

Now, we might have more than one "valid" IP address on the gateway, as observed in:
https://bugzilla.redhat.com/show_bug.cgi?id=2081390#c11 , https://bugzilla.redhat.com/show_bug.cgi?id=2081390#c14

For instance, taken from a different cluster than in the linked BZ:

sh-4.4# ip a show br-ex
7: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 00:52:12:af:f3:53 brd ff:ff:ff:ff:ff:ff
inet6 fd69::2/125 scope global dadfailed tentative <---- masquerade IP, excluded
valid_lft forever preferred_lft forever
inet6 fd2e:6f44:5dd8:c956::4/128 scope global nodad deprecated <--- real node IP, included
valid_lft forever preferred_lft 0sec
inet6 fd2e:6f44:5dd8:c956::17/128 scope global dynamic noprefixroute <---added by keepalive, INCLUDED!!
valid_lft 3017sec preferred_lft 3017sec
inet6 fe80::252:12ff:feaf:f353/64 scope link noprefixroute <--- link local, excluded
valid_lft forever preferred_lft forever

Above we have fd2e:6f44:5dd8:c956::4/128 which is the LB VIP of ingress added by keepalive.

We don't currently distinguish in the code between the node IP as in node.spec.IP and other IPs that might be added to br-ex by other components.

Would it be a good idea to just set the node primary address annotation to match node.spec.IP?

Bug OCPBUGS-13635: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1526

Bug OCPBUGS-13649: Object count quotas do not work for certain objects in ClusterResourceQuotas

View the Description View the linked PRs

Description of problem:

Customer has noticed that object count quotas ("count/*") do not work for certain objects in ClusterResourceQuotas. For example, the following ResourceQuota works as expected:

~~~
apiVersion: v1
kind: ResourceQuota
metadata:
[..]
spec:
  hard:
    count/routes.route.openshift.io: "900"
    count/servicemonitors.monitoring.coreos.com: "100"
    pods: "100"
status:
  hard:
    count/routes.route.openshift.io: "900"
    count/servicemonitors.monitoring.coreos.com: "100"
    pods: "100"
  used:
    count/routes.route.openshift.io: "0"
    count/servicemonitors.monitoring.coreos.com: "1"
    pods: "4"
~~~

However when using "count/servicemonitors.monitoring.coreos.com" in ClusterResourceQuotas, this does not work (note the missing "used"):

~~~
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
[..]
spec:
  quota:
    hard:
      count/routes.route.openshift.io: "900"
      count/servicemonitors.monitoring.coreos.com: "100"
      count/simon.krenger.ch: "100"
      pods: "100"
  selector:
    annotations:
      openshift.io/requester: kube:admin
status:
  namespaces:
[..]
  total:
    hard:
      count/routes.route.openshift.io: "900"
      count/servicemonitors.monitoring.coreos.com: "100"
      count/simon.krenger.ch: "100"
      pods: "100"
    used:
      count/routes.route.openshift.io: "0"
      pods: "4"
~~~

This behaviour does not only apply to "servicemonitors.monitoring.coreos.com" objects, but also to other objects, such as:

- count/kafkas.kafka.strimzi.io: '0' - count/prometheusrules.monitoring.coreos.com: '100' - count/servicemonitors.monitoring.coreos.com: '100' 

The debug output for kube-controller-manager shows the following entries, which may or may not be related:

~~~
$ oc logs kube-controller-manager-ip-10-0-132-228.eu-west-1.compute.internal | grep "servicemonitor" I0511 15:07:17.297620 1 patch_informers_openshift.go:90] Couldn't find informer for monitoring.coreos.com/v1, Resource=servicemonitors I0511 15:07:17.297630 1 resource_quota_monitor.go:181] QuotaMonitor using a shared informer for resource "monitoring.coreos.com/v1, Resource=servicemonitors" I0511 15:07:17.297642 1 resource_quota_monitor.go:233] QuotaMonitor created object count evaluator for servicemonitors.monitoring.coreos.com [..] I0511 15:07:17.486279 1 patch_informers_openshift.go:90] Couldn't find informer for monitoring.coreos.com/v1, Resource=servicemonitors I0511 15:07:17.486297 1 graph_builder.go:176] using a shared informer for resource "monitoring.coreos.com/v1, Resource=servicemonitors", kind "monitoring.coreos.com/v1, Kind=ServiceMonitor" ~~~

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.15

How reproducible:

Always

Steps to Reproduce:

1. On an OCP 4.12 cluster, create the following ClusterResourceQuota:

~~~
apiVersion: quota.openshift.io/v1
kind: ClusterResourceQuota
metadata:
  name: case-03509174
spec:
  quota: 
    hard:
      count/servicemonitors.monitoring.coreos.com: "100"
      pods: "100"
  selector:
    annotations: 
      openshift.io/requester: "kube:admin"
~~~

2. As "kubeadmin", create a new project and deploy one new ServiceMonitor, for example: 

~~~
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: simon-servicemon-2
  namespace: simon-1
spec:
  endpoints:
    - path: /metrics
      port: http
      scheme: http
  jobLabel: component
  selector:
    matchLabels:
      deployment: echoenv-1
~~~

Actual results:

The "used" field for ServiceMonitors is not populated in the ClusterResourceQuota for certain objects. It is unclear if these quotas are enforced or not

Expected results:

ClusterResourceQuota for ServiceMonitors is updated and enforced

Additional info:

* Must-gather for a cluster showing this behaviour (added debug for kube-controller-manager) is available here: https://drive.google.com/file/d/1ioEEHZQVHG46vIzDdNm6pwiTjkL9QQRE/view?usp=share_link
* Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1683876047243989

Bug OCPBUGS-21761: Backport aws-pod-identity-webhook to 4.14

View the Description View the linked PRs

We only intend (at the moment) to backport the pod identity webhook to 4.14. The original text of the card is below:

---------------------------------------------------------------------------------

Description of problem:

Openshift 4.11 still relies on v0.1.0 of aws-pod-identity-webhook (used by openshift-cloud-credential-operator) which is already almost 3 years old and contains several issues that have been fixed in the meantime in the upstream AWS project (latest release 0.4.0): https://github.com/openshift/aws-pod-identity-webhook

We understand that it is on downstream forked but we're interested in at least, an specific backport. The specific issue that customer is facing is  setting `eks.amazonaws.com/sts-regional-endpoints: "true"` on a SA whose NS is labelled with `pod-identity-webhook/mutate=true` does not work

See issue fixed in 0.3.0: https://github.com/aws/amazon-eks-pod-identity-webhook/pull/120).

Version-Release number of selected component (if applicable):

- OpenShift Container Platform 4.11

How reproducible:

Explained in the pull request fix.

Steps to Reproduce:

Explained in the pull request fix.

Actual results:

- Setting eks.amazonaws.com/sts-regional-endpoints: "true"` on a SA whose NS is labelled with `pod-identity-webhook/mutate=true` does not work

Expected results:

To backport issue fixed in 0.3.0: https://github.com/aws/amazon-eks-pod-identity-webhook/pull/120).

Additional info:

https://github.com/openshift/aws-pod-identity-webhook/pull/168

Bug OCPBUGS-10699: Modification of alerts for `Kube*QuotaOvercommit`

View the Description View the linked PRs

There are prometheus rules defined in the kubestate rules which trigger alerts for the `Kube*QuotaOvercommit` ,

These alerts are triggered when the sum of memory/CPU resource quotas for the default/kube-/openshift- namespaces exceed the capacity of the cluster.

Since there are no quotas defined inside default OCP projects and Cu is not expected to create any quota for the default ocp project having these alerts is not adding any value , it would be good to have them removed

https://github.com/openshift/cluster-monitoring-operator/pull/2049

Bug OCPBUGS-12544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-cloud-csi-driver/pull/30

Bug OCPBUGS-21587: Redundant hypershift operator pod on SNO

View the Description View the linked PRs

In case of SNO we should only have 1 replica for the hypershift operator pod.

Currently I'm seeing

oc --kubeconfig ~/kubeconfig get pod -n hypershift                                  
NAME                        READY   STATUS    RESTARTS   AGE
operator-54585fd87b-nd6pq   0/1     Pending   0          4d18h
operator-54585fd87b-p8lxg   1/1     Running   0          4d23h

The pending pod shows the following warning:

  Warning  FailedScheduling  27m (x3181 over 4d18h)  default-scheduler  0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 node(s) didn't match pod an
ti-affinity rules..

https://github.com/openshift/hypershift/pull/3098

Bug OCPBUGS-9378: OpenShift IPI installer uses BIOS instead of UEFI as the boot option on VMware

View the Description View the linked PRs

OCP Version at Install Time: 4.11-fc.3
RHCOS Version at Install Time: 411.86.202206172255-0
Platform: vSphere
Architecture: x86_64

I'm trying to verify that the IPI installer uses UEFI when creating VMs on VMware, following https://github.com/coreos/coreos-assembler/pull/2762 (merged Mar 19).

However, the 4.11.0-fc.3 installer taken from https://mirror.openshift.com/pub/openshift-v4/clients/ocp-dev-preview/4.11.0-fc.3/openshift-install-linux.tar.gz still seems to use BIOS.

Reproducing:

1. Run openshift-install against a VMware vSphere cluster.
2. Wait for an OpenShift VM (bootstrap, control, or worker node) to show up in vCenter.
3. Go to the VM's boot options - the firmware is set to BIOS instead of UEFI, which was supposed to be set by default.

https://github.com/openshift/installer/pull/7154

Bug OCPBUGS-12510: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-node-label-updater/pull/23

Bug OCPBUGS-17234: Differ title on command-line-tools page

View the Description View the linked PRs

Description of problem:

On command-line-tools page,the title is "Command line tools" instead of "Command Line Tools"

Version-Release number of selected component (if applicable):

How reproducible:

1/1

Steps to Reproduce:

1.goto command-line-tools page
2.check the title

Actual results:

the title is "Command line tools"

Expected results:

the title should be "Command Line Tools"

Additional info:

https://github.com/openshift/console/pull/13068

Bug OCPBUGS-8683: CSI driver + operator containers are not pinned to mgmt cores

View the Description View the linked PRs

Clone of ~~OCPBUGS-7906~~, but for all the other CSI drivers and operators than shared resource. All Pods / containers that are part of the OCP platform should run on dedicated "management" CPUs (if configured). I.e. they should have annotation 'target.workload.openshift.io/management:{"effect": "PreferredDuringScheduling"}' .

Enhancement: https://github.com/openshift/enhancements/blob/master/enhancements/workload-partitioning/management-workload-partitioning.md

So far nobody ran our cloud CSI drivers with CPU pinning enabled, so this bug is a low prio. I checked LSO, it already has correct CPU pinning in all Pods, e.g. here.

Bug OCPBUGS-14622: Do not fail creating cgroups

View the Description View the linked PRs

Description of problem:

Upgrade from 4.12 > 4.13 will cause the cpuset-configure.service to faile, because `mkdir` wasn't persistent for `/sys/fs/cgroup/cpuset/system` and `/sys/fs/cgroup/cpuset/machine.slice`.

Version-Release number of selected component (if applicable):

How reproducible:

Extremely (probably for every upgrade to the NTO)

Steps to Reproduce:

1. Upgrade from 4.12
2. Service will fail...

Actual results:

Expected results:

Service should start/finish correctly

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/683

Bug OCPBUGS-14824: CSI Driver Operators should not update the default storageclass annotation back after customers set the default storageclass annotation to false

View the Description View the linked PRs

Description of problem:

[AWS EBS CSI Driver Operator] should not update the default storageclass annotation back after customers remove the default storageclass annotation

Version-Release number of selected component (if applicable):

Server Version: 4.14.0-0.nightly-2023-06-08-102710

How reproducible:

Always

Steps to Reproduce:

1. Install an aws openshift cluster
2. Create 6 extra storage classes(any sc is ok)
3. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false and check all the sc are set as undefault 
4. Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=true 
5. loop step4-5 several times

Actual results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false, sometimes recovered by the driver operator

Expected results:

Overwriter all the sc with the storageclass.kubernetes.io/is-default-class=false should always succeed

Additional info:

Bug MGMT-15661: Disks sometimes have the wrong serial

View the Description View the linked PRs

Description of the problem:

We get the disk serial from ghw, which gets it from looking at 2 udev properties. There are a couple more recent udev properties that should be tried first, as lsblk does:

https://github.com/util-linux/util-linux/blob/36c52fd14b83e6f7eff9a565c426a1e21812403b/misc-utils/lsblk-properties.c#L122-L128

I have a PR open on ghw that should solve the issue. We'll need to update our version of ghw once it's merged.

See more info in the ABI ticket: https://issues.redhat.com/browse/OCPBUGS-18174

https://github.com/openshift/assisted-installer-agent/pull/594

Bug OCPBUGS-14255: [4.14] Add Controller health to CEO liveness probe

View the Description View the linked PRs

We've had several forum cases and bugs already where a restart of the CEO was fixing issues that could be resolved automatically by a liveness probe.

We previously traced it down to stuck/deadlocked controllers, missing timeouts in grpc calls and other issues we haven't been able to find yet. Since the list of failures that can happen is pretty large, we should add a liveness probe to the CEO that will periodically health check:

all controllers have been running sync at least once in the last 5/10 minutes
on failure, produce a goroutine dump to analyse what went wrong

This check should not indicate whether the etcd cluster itself is healthy, it's purely for the CEO itself.

https://github.com/openshift/cluster-etcd-operator/pull/1049

Bug OCPBUGS-17341: OCP console mandate secret for repository creation

View the Description View the linked PRs

Description of problem:

Repository creation in console ask for a mandate secret, does not allow to create repository even for public git url which is weird. However it's working fine with ocp cli

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create repository crd via openshift console
2.
3.

Actual results:

It does not allow me to create the repository

Expected results:

We should be able to create repository crd

Additional info:

slack thread: https://redhat-internal.slack.com/archives/C6A3NV5J9/p1691057766516119

https://github.com/openshift/console/pull/13084

Bug OCPBUGS-17351: openshift-tests unable to run in microshift because of invariants

View the Description View the linked PRs

Description of problem:

Ever since the introduction of the latest invariants feature in origin, MicroShift is unable to run the conformance tests.
Failing invariants include load balancer, image registry and kube-apiserver (https://github.com/openshift/origin/blob/master/pkg/defaultinvariants/types.go#L48-L52) and they are tested for disruptions. These tests don't apply in MicroShift because some of those components don't exist, and none of them are HA.
Requiring the invariants without checking the platform breaks conformance testing in MicroShift.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Run `openshift-tests run openshift/conformance --provider none` with MicroShift kubeconfig.

Steps to Reproduce:

1. 
2.
3.

Actual results:

KUBECONFIG=~/.kube/config ./openshift-tests run openshift/conformance -v 2 --provider none
  Aug  3 11:37:39.859: INFO: MicroShift cluster with version: 4.14.0_0.nightly_2023_06_30_131338_20230703175041_1b2a630fc
I0803 11:37:39.859929    9250 test_setup.go:94] Extended test version v4.1.0-6883-g6ee9dc5
openshift-tests version: v4.1.0-6883-g6ee9dc5
  Aug  3 11:37:39.898: INFO: Enabling in-tree volume drivers
Attempting to pull tests from external binary...
Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed reading ClusterVersion/version: the server could not find the requested resource (get clusterversions.config.openshift.io version)
  W0803 11:37:40.849399    9250 warnings.go:70] unknown field "spec.tls.externalCertificate"
Suite run returned error: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]
No manifest filename passed
error running options: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]error: [namespaces "openshift-image-registry" not found, the server could not find the requested resource (get infrastructures.config.openshift.io cluster)]

Expected results:

Tests running to completion.

Additional info:

A nice addition would be having additional presubmits in origin to run Microshift conformance to catch these things earlier.

https://github.com/openshift/origin/pull/28136

Bug OCPBUGS-10127: Update 4.14 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/62

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/62

Bug OCPBUGS-15155: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/345

Bug OCPBUGS-15940: Storage operator stuck with Upgradable=Unknown

View the Description View the linked PRs

Description of problem:

TRT has unfortunately had to revert this breaking change to get CI and/or nightly payloads flowing again. 
The original PR was https://github.com/openshift/cluster-storage-operator/pull/381.
The revert PR: https://github.com/openshift/cluster-storage-operator/pull/384

The following evidence helped us pushing for the revert:
In the nightly payload runs, periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm has been consistently failing in the last three nightly payloads. But the run in the revert PR passed.

To restore your change, create a new PR that reverts the revert and layers additional separate commit(s) on top that addresses the problem.

Contact information for TRT is available at https://source.redhat.com/groups/public/atomicopenshift/atomicopenshift_wiki/how_to_contact_the_technical_release_team. Please reach out if you need assistance in relanding your change or have feedback about this process.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/385

Bug OCPBUGS-9355: Workloads -> Deployments -> Edit update strategy: 'greater than pod' translation miss

View the Description View the linked PRs

Description of problem:
Navigation:
Workloads -> Deployments -> Edit update strategy
'greater than pod' is in English

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-06-23-044003

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:
Translation missing

Expected results:
Translation should appear

Additional info:

https://github.com/openshift/console/pull/13049

Bug OCPBUGS-10141: Update 4.14 ose-nutanix-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-nutanix/pull/42

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-nutanix/pull/42

Bug OCPBUGS-13628: 4.14 CI: error getting FeatureGates, all installs failing

View the Description View the linked PRs

Baremetal ipi jobs are failing in 4.14 CI since May 12th

bootkube is failing to start with

May 15 10:11:56 localhost.localdomain systemd[1]: Started Bootstrap a Kubernetes cluster.
May 15 10:12:04 localhost.localdomain bootkube.sh[82661]: Rendering Kubernetes Controller Manager core manifests...
May 15 10:12:09 localhost.localdomain bootkube.sh[84029]: F0515 10:12:09.396398       1 render.go:45] error getting FeatureGates: error creating feature accessor: unable to determine features: missing desired version "4.14.0-0.nightly-2023-05-12-121801" in featuregates.config.openshift.io/cluster
May 15 10:12:09 localhost.localdomain systemd[1]: bootkube.service: Main process exited, code=exited, status=255/EXCEPTION
May 15 10:12:09 localhost.localdomain systemd[1]: bootkube.service: Failed with result 'exit-code'.

https://github.com/openshift/installer/pull/7183

Bug OCPBUGS-13946: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17264: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/68

Bug OCPBUGS-17363: BMH is not reconciled on Secret change

View the Description View the linked PRs

When we update a Secret referenced in the BareMetalHost, an immediate reconcile of the corresponding BMH is not triggered. In most states we requeue each CR after a timeout, so we should eventually see the changes.

In the case of BMC Secrets, this has been broken since the fix for ~~OCPBUGS-1080~~ in 4.12.

https://github.com/openshift/baremetal-operator/pull/296

Bug OCPBUGS-19662: Fix MCO Image Registry ConfigMap updating

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18800~~. The following is the description of the original issue:
—
Description of problem:

currently the mco updates its image registry certificate configmap by deleting and re-creating it on each MCO sync. Instead, we should be patching it

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3937

Story TRT-1193: Hypershift conformance job blocking nightly payloads

View the Description View the linked PRs

First showed on https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-08-16-042125

Did not appear to happen on https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-08-15-200133

Changelog is getting huge but I diffed these two PRs:

❯ diff 1.txt 2.txt 
2a3
>     Use go 1.18 when setting up environment (#5422) #5422
15a17
>     CFE-688: Update install-config CRD to support gcp labels and tags #7126
23a26,27
>     OCPBUGS-17711: Revert “pkg/cli/admin/release/extract: Add –included and –install-config” #1527
>     Update openshift/api #1525
28a33
>     pkg/aws/actuator: Drop comment which suggested passthrough permission verification #590
49a55,59
> cluster-control-plane-machine-set-operator
> 
>     OCPCLOUD-2130: Add subnet to Azure FD, fix for optional fields in FD #229
>     Full changelog
> 
64a75
>     IR-373: remove node-ca daemon #867
126a138,147
> cluster-storage-operator
> 
>     STOR-1274: use granular permissons for Azure credential requests #388
>     Full changelog
> 
> cluster-version-operator
> 
>     CNF-9385: add ImageRegistry capability #950
>     Full changelog
> 
132a154,158
> container-networking-plugins
> 
>     OCPBUGS-17681: Default CNI binaries to RHEL 8 #116
>     Full changelog
> 
143a170,174
> haproxy-router
> 
>     OCPBUGS-17653: haproxy/template: mitigate CVE-2023-40225 #505
>     Full changelog
> 
193a225,229
> monitoring-plugin
> 
>     OCPBUGS-17650: Fix /monitoring/ redirects #68
>     Full changelog
> 
204a241,245
> openstack-machine-api-provider
> 
>     Bump CAPO to match branch release-0.7 #80
>     Full changelog
> 
206a248,249
>     OCPBUGS-17157: scripts: add a Go-based bumper, sync upstream #534
>     Add ncdc to DOWNSTREAM_OWNERS #539
223a267
>     update watch-endpoint-slices to usable shape #28184

https://github.com/openshift/cluster-image-registry-operator/pull/899

Bug MGMT-14656: Boot Order configuration problem during installation: Stuck at 'Pending User Action' OCP 4.13

View the Description View the linked PRs

Description of the problem:

When starting installation where the nodes has multiple disks on 4.13, after reboot the installation might stuck on "pending user action" with the following error:

Expected the host to boot from disk, but it booted the installation image - please reboot and fix the boot order to boot from disk QEMU_HARDDISK 05abcd32e95a61a3 (sda, /dev/disk/by-id/wwn-0x05abcd32e95a61a3).

When running the live-iso with RHEL /dev/sda might actually be vdb.
Since the boot order configuration is usally HD first, machine usually try vda before it moves on to try other boot options (that are not HD).
When installing on /dev/sda (vdb) the machine might not try to boot from the installation disk.

Solution suggestion:
A better way to find vda is by the hctl ( 0:0:0:0 should be /vda)
Action item: in case of libvirt (why not all platforms?) we should update the way we choose the default installation disk and choose the disk with hctl 0:0:0:0 (when it's available...)

How reproducible:

Create nodes with 2 disks and start installation.

Steps to reproduce:

1. Register new cluster

2. Add 6 nodes (3 master + 3 workers) with multiple disks each - might be even reproducible with only 3 masters

3. Start the installation

Note that it might take a few attempts to reproduce this issue

Actual results:

Pending for input

Expected results:

Installation success

Slack thread https://redhat-internal.slack.com/archives/CUPJTHQ5P/p1684317064257809

https://github.com/openshift/assisted-service/pull/5354

Bug OCPBUGS-10134: Update 4.14 ose-alibaba-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-alibaba/pull/41

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-alibaba/pull/41

Bug OCPBUGS-22430: 4.14: vmware-vsphere-csi-driver-webhook handles HTTP/2 requests

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22385~~. The following is the description of the original issue:
—
Description of problem:

Currently, vmware-vsphere-csi-driver-webhook exposes HTTP/2 endpoints:

$ oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv   https://localhost:8443/readyz

...
* ALPN, server accepted to use h2
> GET /readyz HTTP/2
< HTTP/2 404

To err on the side of caution, we should discontinue the handling of HTTP/2 requests.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc -n openshift-cluster-csi-drivers exec deployment/vmware-vsphere-csi-driver-webhook -- curl -kv https://localhost:8443/readyz 2.
3.

Actual results:

HTTP/2 requests are accepted

Expected results:

HTTP/2 requests shouldn't be accepted by wehook

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/182

Task HOSTEDCP-979: Re-enable nodepool in-place upgrade tests

View the Description View the linked PRs

From wking:

$ git --no-pager grep ~~OCPBUGS-10218~~
test/e2e/nodepool_test.go: // TODO: (csrwng) Re-enable when https://issues.redhat.com/browse/OCPBUGS-10218
is fixed
test/e2e/nodepool_test.go: // TODO: (jparrill) Re-enable when https://issues.redhat.com/browse/OCPBUGS-10218
is fixed
but https://issues.redhat.com/browse/OCPBUGS-10218 was closed as a dup of https://issues.redhat.com/browse/OCPBUGS-10485 , and ~~OCPBUGS-10485~~ is Verified with happy sounds for both 4.13 and 4.14 nightlies

https://github.com/openshift/hypershift/pull/2960

Bug OCPBUGS-10379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3623

Bug OCPBUGS-13810: CI fails on TestAWSELBConnectionIdleTimeout

View the Description View the linked PRs

Description of problem

CI is flaky because the TestAWSELBConnectionIdleTimeout test fails. Example failures:

Version-Release number of selected component (if applicable)

I have seen these failures in 4.14 and 4.13 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 1.24% of runs (3.52% of failures) across 404 total runs and 34 jobs (35.15% failed)

This includes two jobs:

pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 40 runs, 63% failed, 16% of failures match = 10% impact
pull-ci-openshift-cluster-ingress-operator-release-4.13-e2e-aws-operator (all) - 10 runs, 70% failed, 14% of failures match = 10% impact

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestAWSELBConnectionIdleTimeout&maxAge=336h&context=1&type=all&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails because it times out waiting for DNS to resolve:

=== RUN   TestAll/parallel/TestAWSELBConnectionIdleTimeout
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2650: lookup idle-timeout-httpd-openshift-ingress.test-idle-timeout.ci-op-sh28dt25-08f48.origin-ci-int-aws.dev.rhcloud.com on 172.30.0.10:53: no such host
    operator_test.go:2656: failed to observe expected condition: timed out waiting for the condition
    panic.go:522: deleted ingresscontroller test-idle-timeout

The above output comes from build-log.txt from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/917/pull-ci-openshift-cluster-ingress-operator-release-4.13-e2e-aws-operator/1658840125502656512.

Expected results

CI passes, or it fails on a different test.

https://github.com/openshift/cluster-ingress-operator/pull/944

Bug OCPBUGS-14932: Images: update azure cli to 2.40.0+ in upi-installer to avoid security vulnerability

View the Description View the linked PRs

Description of problem:

Due to security vulnerability[1] affecting Azure CLI versions previous to 2.40.0(not included), it is recommended to update azure cli to higher version to avoid this issue. Currently, azure cli in CI is 2.38.0.

[1] https://github.com/Azure/azure-cli/security/advisories/GHSA-47xc-9rr2-q7p4

Version-Release number of selected component (if applicable):

All supported version

How reproducible:

Always

Steps to Reproduce:

1. Trigger CI jobs on azure platform that contains steps using azure cli.
2. 
3.

Actual results:

azure cli 2.38.0 is used now.

Expected results:

azure cli 2.40.0+ to be used in CI on all supported version

Additional info:

As azure cli 2.40.0+ is only available in rhel8-based repository, need to update its repo in upi-installer rhel8-based docker file[1]

[1] https://github.com/openshift/installer/blob/master/images/installer/Dockerfile.upi.ci.rhel8#L23

Bug OCPBUGS-7980: Verify Hyper-Thread aware scheduling for guaranteed pods test fails on 4.13

View the Description View the linked PRs

Description of problem:

Test in periodic job of 4.13 release fails in about 30% jobs:
[rfe_id:27363][performance] CPU Management Hyper-thread aware scheduling for guaranteed pods Verify Hyper-Thread aware scheduling for guaranteed pods [test_id:46959] Number of CPU requests as multiple of SMT count allowed when HT enabled

Version-Release number of selected component (if applicable):

4.13

How reproducible:

In periodic jobs

Steps to Reproduce:

Run cnf tests on 4.13

Actual results:

Expected results:

Additional info:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g-cnftests/1628395172440051712/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

https://github.com/openshift/cluster-node-tuning-operator/pull/729

Bug OCPBUGS-12895: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/54

Bug OCPBUGS-14815: Update OWNERS and OWNERS_ALIASES in external-attacher repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-attacher/pull/55

Bug OCPBUGS-18868: 4.14 Latency: fix common failures

View the Description View the linked PRs

Description of problem:

place holder bug to backport common latency failures

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/788

Bug OCPBUGS-24315: Konnectivity container in apiserver pod should delay shutdown

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22459~~. The following is the description of the original issue:
—
Description of problem:

In HyperShift 4.14, the konnectivity server is run inside the kube-apiserver pod. When this pod is deleted for any reason, the konnectivity server container can drop before the rest of the pod terminates, which can cause network connections to drop. The following preStop definition can be added to the container to ensure it stays alive long enough for the rest of the pod to clean up.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - sleep 70

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3268

Bug OCPBUGS-27063: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/787

Bug OCPBUGS-20038: Many SNOs failed to complete install because "the cluster operator cluster-autoscaler is not available"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18954~~. The following is the description of the original issue:
—
Description of problem:

While installing 3618 SNOs via ZTP using ACM 2.9, 15 clusters failed to complete install and have failed on the cluster-autoscaler operator. This represents the bulk of all cluster install failures in this testbed for OCP 4.14.0-rc.0.


# cat aci.InstallationFailed.autoscaler  | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers "
vm00527 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00717 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00881 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm00998 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01006 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01059 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01155 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm01930 version         False   True   17h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02407 version         False   True   16h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm02651 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03073 version         False   True   19h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03258 version         False   True   20h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03295 version         False   True   14h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03303 version         False   True   15h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available
vm03517 version         False   True   18h   Unable to apply 4.14.0-rc.0: the cluster operator cluster-autoscaler is not available

Version-Release number of selected component (if applicable):

Hub 4.13.11
Deployed SNOs 4.14.0-rc.0
ACM 2.9 - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52

How reproducible:

15 out of 20 failures (75% of the failures)
15 out of 3618 total attempted SNOs to be installed ~.4% of all installs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It appears that some show in the logs of the cluster-autoscaler-operator an error, Example:

I0912 19:54:39.962897       1 main.go:15] Go Version: go1.20.5 X:strictfipsruntime
I0912 19:54:39.962977       1 main.go:16] Go OS/Arch: linux/amd64
I0912 19:54:39.962982       1 main.go:17] Version: cluster-autoscaler-operator v4.14.0-202308301903.p0.gb57f5a9.assembly.stream-dirty
I0912 19:54:39.963137       1 leaderelection.go:122] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I0912 19:54:39.975478       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"="127.0.0.1:9191"
I0912 19:54:39.976939       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-clusterautoscalers"
I0912 19:54:39.976984       1 server.go:187] controller-runtime/webhook "msg"="Registering webhook" "path"="/validate-machineautoscalers"
I0912 19:54:39.977082       1 main.go:41] Starting cluster-autoscaler-operator
I0912 19:54:39.977216       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server" 
I0912 19:54:39.977693       1 certwatcher.go:161] controller-runtime/certwatcher "msg"="Updated current TLS certificate" 
I0912 19:54:39.977813       1 server.go:273] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=8443
I0912 19:54:39.977938       1 certwatcher.go:115] controller-runtime/certwatcher "msg"="Starting certificate watcher" 
I0912 19:54:39.978008       1 server.go:50]  "msg"="starting server" "addr"={"IP":"127.0.0.1","Port":9191,"Zone":""} "kind"="metrics" "path"="/metrics"
I0912 19:54:39.978052       1 leaderelection.go:245] attempting to acquire leader lease openshift-machine-api/cluster-autoscaler-operator-leader...
I0912 19:54:39.982052       1 leaderelection.go:255] successfully acquired lease openshift-machine-api/cluster-autoscaler-operator-leader
I0912 19:54:39.983412       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ClusterAutoscaler"
I0912 19:54:39.983462       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Deployment"
I0912 19:54:39.983483       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.Service"
I0912 19:54:39.983501       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.ServiceMonitor"
I0912 19:54:39.983520       1 controller.go:177]  "msg"="Starting EventSource" "controller"="cluster_autoscaler_controller" "source"="kind source: *v1.PrometheusRule"
I0912 19:54:39.983532       1 controller.go:185]  "msg"="Starting Controller" "controller"="cluster_autoscaler_controller"
I0912 19:54:39.986041       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *v1beta1.MachineAutoscaler"
I0912 19:54:39.986065       1 controller.go:177]  "msg"="Starting EventSource" "controller"="machine_autoscaler_controller" "source"="kind source: *unstructured.Unstructured"
I0912 19:54:39.986072       1 controller.go:185]  "msg"="Starting Controller" "controller"="machine_autoscaler_controller"
I0912 19:54:40.095808       1 webhookconfig.go:72] Webhook configuration status: created
I0912 19:54:40.101613       1 controller.go:219]  "msg"="Starting workers" "controller"="cluster_autoscaler_controller" "worker count"=1
I0912 19:54:40.102857       1 controller.go:219]  "msg"="Starting workers" "controller"="machine_autoscaler_controller" "worker count"=1
E0912 19:58:48.113290       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": net/http: TLS handshake timeout - error from a previous attempt: unexpected EOF
E0912 20:02:48.135610       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused
E0913 13:49:02.118757       1 leaderelection.go:327] error retrieving resource lock openshift-machine-api/cluster-autoscaler-operator-leader: Get "https://[fd02::1]:443/apis/coordination.k8s.io/v1/namespaces/openshift-machine-api/leases/cluster-autoscaler-operator-leader": dial tcp [fd02::1]:443: connect: connection refused

https://github.com/openshift/cluster-autoscaler-operator/pull/292

Bug OCPBUGS-14887: CMO version tags outdated

View the Description View the linked PRs

The version tracker needs an update.

https://github.com/openshift/cluster-monitoring-operator/pull/1995

Bug OCPBUGS-15945: CNO degraded with "Panic detected: net/http: abort Handler"

View the Description View the linked PRs

Description of problem:

CNO panics with net/http: abort Handler while installing SNO cluster on OpenshiftSDN

network                                    4.14.0-0.nightly-2023-07-05-191022   True        False         True       9h      Panic detected: net/http: abort Handler

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

sometimes

Steps to Reproduce:

1.Install OpenshiftSDN cluster on SNO
2.
3.

Actual results:

Cluster (CNO) reports errors

Expected results:

Cluster should be installed fine

Additional info:

SOS: http://shell.lab.bos.redhat.com/~anusaxen/sosreport-rg-0707-tl6fd-master-0-2023-07-07-pyaruar.tar.xz

MG:  http://shell.lab.bos.redhat.com/~anusaxen/must-gather.local.4340060474822893433/

https://github.com/openshift/cluster-network-operator/pull/1893

Bug OCPBUGS-16925: only attempt to remove finalizers if staticIPFeatureGateEnabled

View the Description View the linked PRs

Description of problem:

On clusters without the TechPreview feature set enabled, machines are failing to delete due to an attempt to list an IPAM that is not installed.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

consistently

Steps to Reproduce:

1. Create a platform vSphere cluster
2. Scale down a machine

Actual results:

Machine fails to delete

Expected results:

Machine should delete

Additional info:

Fails with unable to list IPAddressClaims: failed to get API group resources: unable to retrieve the complete list of server APIs: ipam.cluster.x-k8s.io/v1alpha1: the server could not find the requested resource

https://github.com/openshift/machine-api-operator/pull/1160

Bug OCPBUGS-22274: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/803

Bug OCPBUGS-8119: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6949

Bug OCPBUGS-11284: Azure cloud node manager stopped applying beta topology labels

View the Description View the linked PRs

Description of problem:

When we rebased to 1.26, the rebase picked up https://github.com/kubernetes-sigs/cloud-provider-azure/pull/2653/ which made the Azure cloud node manager stop applying beta toplogy labels, such as failure-domain.beta.kubernetes.io/zone

Since we haven't completed the removal cycle for this, we still need the node manager to apply these labels. In the future we must ensure that these labels are available until users are no longer using them.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create a TP cluster on 4.13
2. Observe no beta label for zone or region
3.

Actual results:

Beta labels are not present

Expected results:

Beta labels are present and should match GA labels

Additional info:

Created https://github.com/kubernetes-sigs/cloud-provider-azure/pull/3685 to try and make upstream allow this to be flagged

Bug OCPBUGS-11450: Multus admission controller must have "hypershift.openshift.io/release-image" annotation when CNO is managed by Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift, it's deployment has "hypershift.openshift.io/release-image" template metadata annotation. The annotation's value is used to track progress of cluster control plane version upgrades. But multus-admission-controller created and managed by CNO does not have that annotation so service providers are not able to track its version upgrades.

The proposed solution is for CNO to propagate its "hypershift.openshift.io/release-image" annotation down to the multus-admission-controller deployment. For that CNO need to have "get" access to its own deployment manifest to be able to read the deployment template metadata annotations. 

Hypershift needs code change to assign CNO "get" permission on the CNO deployment object.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check deployment template metadata annotations on multus-admission-controller

Actual results:

No "hypershift.openshift.io/release-image" deployment template metadata annotation exists

Expected results:

"hypershift.openshift.io/release-image" annotation must be present

Additional info:

https://github.com/openshift/hypershift/pull/2384

Bug OCPBUGS-17227: Cluster Provisioning fails with failed to fetch instance type

View the Description View the linked PRs

Description of problem:

Cluster Provisioning fails with the message:
Internal error: failed to fetch instance type, this error usually occurs if the region or the instance type is not found

This is likely because OCM uses GCP custom machine types, for example custom-4-16384 and now the installer is validating machine types per zone (see GetMachineTypeWithZones function), which don't include custom machine types.

See https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#gcloud for more details.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

ocm create cluster cluster001 --provider=gcp --ccs=true --region=us-east1 --service-account-file=token.json --version="4.14.0-0.nightly-2023-08-02-102121-nightly" 2.

Actual results:

Cluster installation fails

Expected results:

Cluster installation succeeds

Additional info:

https://github.com/openshift/installer/pull/7388

Bug OCPBUGS-21066: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/224

Bug OCPBUGS-2153: TenantID is ignored in some cases

View the Description View the linked PRs

When ProjectID is not set, TenantID might be ignored in MAPO.

Context: When setting additional networks in Machine templates, networks can be identified by the means of a filter. The network filter has both TenantID and ProjectID as fields. TenantID was ignored.

Steps to reproduce:
Create a Machine or a MachineSet with a template containing a Network filter that sets a TenantID.

```
networks:

filter:
id: 'the-network-id'
tenantId: '123-123-123'
```

One cheap way of testing this could be to pass a valid network ID and set a bogus tenantID. If the machine gets associated with the network, then tenantID has been ignored and the bug is present. If instead MAPO errors, then in means that it has taken tenantID into consideration.

https://github.com/openshift/machine-api-provider-openstack/pull/61

Bug OCPBUGS-8404: Specifying non-existen secret for API namedCertificates renders inconsistent config and causes kube-apiserver crash-loop

View the Description View the linked PRs

Description of problem:

If a custom API server certificate is added as per documentation[1], but the secret name is wrong and points to a non-existing secret, the following happens:
- The kube-apiserver config is rendered with some of the namedCertificates pointing to /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/
- As the secret in apiserver/cluster object is wrong, no user-serving-cert-000 secret is generated, so the /etc/kubernetes/static-pod-certs/secrets/user-serving-cert-000/ does not exist (and may be automatically removed if manually created).
- The combination of the 2 points above causes kube-apiserver to start crash-looping because its config points to non-existent certificates.

This is a cluster-kube-apiserver-operator, because it should validate that the specified secret exists and degrade and do nothing if it doesn't, not render inconsistent configuration.

Version-Release number of selected component (if applicable):

First found in 4.11.13, but also reproduced in the latest nightly build.

How reproducible:

Always

Steps to Reproduce:

1. Setup a named certificate pointing to a secret that doesn't exist.
2.
3.

Actual results:

Inconsistent configuration that points to non-existing secret. Kube API server pod crash-loop.

Expected results:

Cluster Kube API Server Operator to detect that the secret is wrong, do nothing and only report itself as degraded with meaningful message so the user can fix. No Kube API server pod crash-looping.

Additional info:

Once the kube-apiserver is broken, even if the apiserver/cluster object is fixed, it is usually needed to apply a manual workaround in the crash-looping master. An example of workaround that works is[2], even though that KB article was written for another bug with different root cause. 

References:

[1] - https://docs.openshift.com/container-platform/4.11/security/certificates/api-server.html#api-server-certificates
[2] - https://access.redhat.com/solutions/4893641

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1482

Bug USHIFT-1300: openshift-tests panic when checking cluster infrastructure

View the Description View the linked PRs

Description of problem:

Since the introduction of https://github.com/openshift/origin/pull/27570 the openshift-tests binary now looks for the cluster infra resource for later usage (setting TEST_PROVIDER env var when running run-test command to inject details about the cluster). Since microshift does not have this resource the returned value is nil and it panics when its used later in the code.

Version-Release number of selected component (if applicable):

How reproducible:

Run openshift-tests and it immediately panics

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27964

Bug OCPBUGS-10887: oauth-server fails to invalidate cache, causing non existing groups being referenced

View the Description View the linked PRs

Description of problem:

Following https://bugzilla.redhat.com/show_bug.cgi?id=2102765 respectively https://issues.redhat.com/browse/OCPBUGS-2140 problems with OpenID Group sync have been resolved.

Yet the problem documented in https://bugzilla.redhat.com/show_bug.cgi?id=2102765 still does exist and we see that Groups that are being removed are still part of the chache in oauth-apiserver, causing a panic of the respective components and failures during login for potentially affected users.

So in general, it looks like that oauth-apiserver cache is not properly refreshing or handling the OpenID Groups being synced.

E1201 11:03:14.625799       1 runtime.go:76] Observed a panic: interface conversion: interface {} is nil, not *v1.Group
goroutine 3706798 [running]:
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1.1()
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:103 +0xb0
panic({0x1aeab00, 0xc001400390})
    runtime/panic.go:838 +0x207
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1.1()
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:80 +0x2a
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1.1()
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:89 +0x250
panic({0x1aeab00, 0xc001400390})
    runtime/panic.go:838 +0x207
github.com/openshift/library-go/pkg/oauth/usercache.(*GroupCache).GroupsFor(0xc00081bf18?, {0xc000c8ac03?, 0xc001400360?})
    github.com/openshift/library-go@v0.0.0-20211013122800-874db8a3dac9/pkg/oauth/usercache/groups.go:47 +0xe7
github.com/openshift/oauth-server/pkg/groupmapper.(*UserGroupsMapper).processGroups(0xc0002c8880, {0xc0005d4e60, 0xd}, {0xc000c8ac03, 0x7}, 0x1?)
    github.com/openshift/oauth-server/pkg/groupmapper/groupmapper.go:101 +0xb5
github.com/openshift/oauth-server/pkg/groupmapper.(*UserGroupsMapper).UserFor(0xc0002c8880, {0x20f3c40, 0xc000e18bc0})
    github.com/openshift/oauth-server/pkg/groupmapper/groupmapper.go:83 +0xf4
github.com/openshift/oauth-server/pkg/oauth/external.(*Handler).login(0xc00022bc20, {0x20eebb0, 0xc00041b058}, 0xc0015d8200, 0xc001438140?, {0xc0000e7ce0, 0x150})
    github.com/openshift/oauth-server/pkg/oauth/external/handler.go:209 +0x74f
github.com/openshift/oauth-server/pkg/oauth/external.(*Handler).ServeHTTP(0xc00022bc20, {0x20eebb0, 0xc00041b058}, 0x0?)
    github.com/openshift/oauth-server/pkg/oauth/external/handler.go:180 +0x74a
net/http.(*ServeMux).ServeHTTP(0x1c9dda0?, {0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    net/http/server.go:2462 +0x149
github.com/openshift/oauth-server/pkg/server/headers.WithRestoreAuthorizationHeader.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:27 +0x10f
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0xc0005e0280?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAuthorization.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/authorization.go:64 +0x498
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x2f6cea0?, {0x20eebb0?, 0xc00041b058?}, 0x3?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithMaxInFlightLimit.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/maxinflight.go:187 +0x2a4
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0x11?, {0x20eebb0?, 0xc00041b058?}, 0x1aae340?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithImpersonation.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/impersonation.go:50 +0x21c
net/http.HandlerFunc.ServeHTTP(0xc000d52120?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20eebb0?, 0xc00041b058?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x20eebb0, 0xc00041b058}, 0xc0015d8200)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0xc0015d8100?, {0x20eebb0?, 0xc00041b058?}, 0xc000531930?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAudit.func1({0x7fae682a40d8?, 0xc00041b048}, 0x9dbbaa?)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit.go:111 +0x549
net/http.HandlerFunc.ServeHTTP(0xc00003def0?, {0x7fae682a40d8?, 0xc00041b048?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:79 +0x178
net/http.HandlerFunc.ServeHTTP(0x0?, {0x7fae682a40d8?, 0xc00041b048?}, 0x0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackCompleted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:103 +0x1a5
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x7fae682a40d8?, 0xc00041b048?}, 0x20cfd00?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withAuthentication.func1({0x7fae682a40d8, 0xc00041b048}, 0xc0015d8100)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/authentication.go:80 +0x8b9
net/http.HandlerFunc.ServeHTTP(0x20f0f20?, {0x7fae682a40d8?, 0xc00041b048?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filterlatency.trackStarted.func1({0x7fae682a40d8, 0xc00041b048}, 0xc000e69e00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filterlatency/filterlatency.go:88 +0x46b
net/http.HandlerFunc.ServeHTTP(0xc0019f5890?, {0x7fae682a40d8?, 0xc00041b048?}, 0xc000848764?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithCORS.func1({0x7fae682a40d8, 0xc00041b048}, 0xc000e69e00)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/cors.go:75 +0x10b
net/http.HandlerFunc.ServeHTTP(0xc00149a380?, {0x7fae682a40d8?, 0xc00041b048?}, 0xc0008487d0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP.func1()
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:108 +0xa2
created by k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:94 +0x2cc

goroutine 3706802 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x19eb780?, 0xc001206e20})
    k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:74 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0xc0016aec60, 0x1, 0x1560f26?})
    k8s.io/apimachinery@v0.22.2/pkg/util/runtime/runtime.go:48 +0x75
panic({0x19eb780, 0xc001206e20})
    runtime/panic.go:838 +0x207
k8s.io/apiserver/pkg/server/filters.(*timeoutHandler).ServeHTTP(0xc0005047c8, {0x20eecd0?, 0xc0010fae00}, 0xdf8475800?)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/timeout.go:114 +0x452
k8s.io/apiserver/pkg/endpoints/filters.withRequestDeadline.func1({0x20eecd0, 0xc0010fae00}, 0xc000e69d00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/request_deadline.go:101 +0x494
net/http.HandlerFunc.ServeHTTP(0xc0016af048?, {0x20eecd0?, 0xc0010fae00?}, 0xc0000bc138?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.WithWaitGroup.func1({0x20eecd0?, 0xc0010fae00}, 0xc000e69d00)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/waitgroup.go:59 +0x177
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x7fae705daff0?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithAuditAnnotations.func1({0x20eecd0, 0xc0010fae00}, 0xc000e69c00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/audit_annotations.go:37 +0x230
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithWarningRecorder.func1({0x20eecd0?, 0xc0010fae00}, 0xc000e69b00)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/warning.go:35 +0x2bb
net/http.HandlerFunc.ServeHTTP(0x1c9dda0?, {0x20eecd0?, 0xc0010fae00?}, 0xd?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithCacheControl.func1({0x20eecd0, 0xc0010fae00}, 0x0?)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/cachecontrol.go:31 +0x126
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20eecd0?, 0xc0010fae00?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/httplog.WithLogging.func1({0x20ef480?, 0xc001c20620}, 0xc000e69a00)
    k8s.io/apiserver@v0.22.2/pkg/server/httplog/httplog.go:103 +0x518
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20ef480?, 0xc001c20620?}, 0x20cfc08?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.WithRequestInfo.func1({0x20ef480, 0xc001c20620}, 0xc000e69900)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/requestinfo.go:39 +0x316
net/http.HandlerFunc.ServeHTTP(0x20f0f58?, {0x20ef480?, 0xc001c20620?}, 0xc0007c3f70?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withRequestReceivedTimestampWithClock.func1({0x20ef480, 0xc001c20620}, 0xc000e69800)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/request_received_time.go:38 +0x27e
net/http.HandlerFunc.ServeHTTP(0x419e2c?, {0x20ef480?, 0xc001c20620?}, 0xc0007c3e40?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server/filters.withPanicRecovery.func1({0x20ef480?, 0xc001c20620?}, 0xc0004ff600?)
    k8s.io/apiserver@v0.22.2/pkg/server/filters/wrap.go:74 +0xb1
net/http.HandlerFunc.ServeHTTP(0x1c05260?, {0x20ef480?, 0xc001c20620?}, 0x8?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/endpoints/filters.withAuditID.func1({0x20ef480, 0xc001c20620}, 0xc000e69600)
    k8s.io/apiserver@v0.22.2/pkg/endpoints/filters/with_auditid.go:66 +0x40d
net/http.HandlerFunc.ServeHTTP(0x1c9dda0?, {0x20ef480?, 0xc001c20620?}, 0xd?)
    net/http/server.go:2084 +0x2f
github.com/openshift/oauth-server/pkg/server/headers.WithPreserveAuthorizationHeader.func1({0x20ef480, 0xc001c20620}, 0xc000e69600)
    github.com/openshift/oauth-server/pkg/server/headers/oauthbasic.go:16 +0xe8
net/http.HandlerFunc.ServeHTTP(0xc0016af9d0?, {0x20ef480?, 0xc001c20620?}, 0x16?)
    net/http/server.go:2084 +0x2f
github.com/openshift/oauth-server/pkg/server/headers.WithStandardHeaders.func1({0x20ef480, 0xc001c20620}, 0x4d55c0?)
    github.com/openshift/oauth-server/pkg/server/headers/headers.go:30 +0x18f
net/http.HandlerFunc.ServeHTTP(0x0?, {0x20ef480?, 0xc001c20620?}, 0xc0016afac8?)
    net/http/server.go:2084 +0x2f
k8s.io/apiserver/pkg/server.(*APIServerHandler).ServeHTTP(0xc00098d622?, {0x20ef480?, 0xc001c20620?}, 0xc000401000?)
    k8s.io/apiserver@v0.22.2/pkg/server/handler.go:189 +0x2b
net/http.serverHandler.ServeHTTP({0xc0019f5170?}, {0x20ef480, 0xc001c20620}, 0xc000e69600)
    net/http/server.go:2916 +0x43b
net/http.(*conn).serve(0xc0002b1720, {0x20f0f58, 0xc0001e8120})
    net/http/server.go:1966 +0x5d7
created by net/http.(*Server).Serve
    net/http/server.go:3071 +0x4db

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.11.13

How reproducible:

- Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.11
2. Configure OpenID Group Sync (as per https://docs.openshift.com/container-platform/4.11/authentication/identity_providers/configuring-oidc-identity-provider.html#identity-provider-oidc-CR_configuring-oidc-identity-provider)
3. Have users with hundrets of groups
4. Login and after a while, remove some Groups from the user in the IDP and from OpenShift Container Platform 
5. Try to login again and see the panic in oauth-apiserver

Actual results:

User is unable to login and oauth pods are reporting a panic as shown above

Expected results:

oauth-apiserver should invalidate the cache quickly to remove potential invalid references to non exsting groups

Additional info:

https://github.com/openshift/oauth-server/pull/123

Bug OCPBUGS-20884: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/37

Bug OCPBUGS-22945: Specify google cloud CLI to version 447.0.0

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22830~~. The following is the description of the original issue:
—
Description of problem:

google CLI deprecated Python 3.5-3.7 from 448.0.0 causing release ci jobs failed with ERROR: gcloud failed to load. You are running gcloud with Python 3.6, which is no longer supported by gcloud. . specified version to 447.0.0
job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-o[…]cp-upi-f28-destructive/1719562110486188032

https://github.com/openshift/installer/pull/7681

Bug OCPBUGS-10612: oc commands should use podman credentials by default instead of docker ones

View the Description View the linked PRs

Description of problem:

In 4.10 we added an option REGISTRY_AUTH_PREFERENCE to opt-in for podman registry auth file prefence reading order. This is important for oc registry commands like oc registry login and oc image. https://github.com/openshift/oc/pull/893

We also started warning users that we will remove support for docker order and default to podman order - meaning we will check podman locations first and then we will fallback to docker locations.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

We should default to podman auth file locations and remove a warning when using oc registry login or oc image commands without REGISTRY_AUTH_PREFERENCE variable.

Additional info:

https://github.com/openshift/oc/pull/1376

Bug OCPBUGS-13718: [azure] Installer doesn't validate diskType on ASH which lead to install fails with unsupported disktype

View the Description View the linked PRs

Description of problem:

IPI install on azure stack failed when setting platform.azure.osDiks.diskType as StandardSSD_LRS in install-config.yaml.

When setting controlPlane.platform.azure.osDisk.diskType as StandardSSD_LRS, get error in terraform log and some resources have been created.

level=error msg=Error: expected storage_os_disk.0.managed_disk_type to be one of [Premium_LRS Standard_LRS], got StandardSSD_LRS
level=error
level=error msg=  with azurestack_virtual_machine.bootstrap,
level=error msg=  on main.tf line 107, in resource "azurestack_virtual_machine" "bootstrap":
level=error msg= 107: resource "azurestack_virtual_machine" "bootstrap" {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: expected storage_os_disk.0.managed_disk_type to be one of [Premium_LRS Standard_LRS], got StandardSSD_LRS
level=error
level=error msg=  with azurestack_virtual_machine.bootstrap,
level=error msg=  on main.tf line 107, in resource "azurestack_virtual_machine" "bootstrap":
level=error msg= 107: resource "azurestack_virtual_machine" "bootstrap" {
level=error
level=error

When setting compute.platform.azure.osDisk.diskType as StandardSSD_LRS, fail to provision compute machines

$ oc get machine -n openshift-machine-api
NAME                                     PHASE     TYPE              REGION   ZONE   AGE
jima414ash03-xkq5x-master-0              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-master-1              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-master-2              Running   Standard_DS4_v2   mtcazs          62m
jima414ash03-xkq5x-worker-mtcazs-89mgn   Failed                                      52m
jima414ash03-xkq5x-worker-mtcazs-jl5kk   Failed                                      52m
jima414ash03-xkq5x-worker-mtcazs-p5kvw   Failed                                      52m

$ oc describe machine jima414ash03-xkq5x-worker-mtcazs-jl5kk -n openshift-machine-api
...
  Error Message:           failed to reconcile machine "jima414ash03-xkq5x-worker-mtcazs-jl5kk": failed to create vm jima414ash03-xkq5x-worker-mtcazs-jl5kk: failure sending request for machine jima414ash03-xkq5x-worker-mtcazs-jl5kk: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Storage account type 'StandardSSD_LRS' is supported by Microsoft.Compute API version 2018-04-01 and above" Target="osDisk.managedDisk.storageAccountType"
...

Based on azure-stack doc[1], supported disk types on ASH are Premium SSD, Standard HDD. It's better to do validation for diskType on Azure Stack to avoid above errors.

[1]https://learn.microsoft.com/en-us/azure-stack/user/azure-stack-managed-disk-considerations?view=azs-2206&tabs=az1%2Caz2#cheat-sheet-managed-disk-differences

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-16-085836

How reproducible:

Always

Steps to Reproduce:

1. Prepare install-config.yaml, set platform.azure.osDiks.diskType as StandardSSD_LRS
2. Install IPI cluster on Azure Stack
3.

Actual results:

Installation failed

Expected results:

Installer validate diskType on AzureStack Cloud, and exit for unsupported disk type with error message

Additional info:

https://github.com/openshift/installer/pull/7194

Bug OCPBUGS-5478: Build and base images for the operator are not accessible to public

View the Description View the linked PRs

Description of problem:

I am trying to build the operator image locally and fail because the registry `registry.ci.openshift.org/ocp/` requires authorization

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. git clone git@github.com:openshift/cluster-ingress-operator.git
2. export REPO=<path to a repository to upload the image>
3. run `make release-local`

Actual results:

[skip several lines]
Step 1/10 : FROM registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.12 AS builder                                                                                             
unauthorized: authentication required

Expected results:

image is pulled and the build succeeded

Additional info:

There are two images that are not available:
- registry.ci.openshift.org/ocp/builder:rhel-8-golang-1.19-openshift-4.12
- registry.ci.openshift.org/ocp/4.12:base

I was able to fix this by changing the images to
- registry.ci.openshift.org/openshift/release:golang-1.19                     - registry.ci.openshift.org/origin/4.12:base                                 

see https://github.com/dudinea/cluster-ingress-operator/tree/fix-build-images-not-public

I am not sure what I did is OK, but I suppose that this project,               being part of OKD should be easily buildable by the public
or at least the issue should be documented somewhere.                         
                                                        
I wanted to post this to the OKD project, but I am unable to select it in jira.

https://github.com/openshift/cluster-ingress-operator/pull/925

Bug OCPBUGS-8468: aws: mismatch between RHCOS and AWS SDK regions

View the Description View the linked PRs

Description of problem:

RHCOS is being published to new AWS regions (https://github.com/openshift/installer/pull/6861) but aws-sdk-go need to be bumped to recognize those regions

Version-Release number of selected component (if applicable):

master/4.14

How reproducible:

always

Steps to Reproduce:

1. openshift-install create install-config
2. Try to select ap-south-2 as a region
3.

Actual results:

New regions are not found. New regions are: ap-south-2, ap-southeast-4, eu-central-2, eu-south-2, me-central-1.

Expected results:

Installer supports and displays the new regions in the Survey

Additional info:

See https://github.com/openshift/installer/blob/master/pkg/asset/installconfig/aws/regions.go#L13-L23

https://github.com/openshift/installer/pull/6943

Bug OCPBUGS-19033: dnsmasq failing to start on bootstrap VM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19017~~. The following is the description of the original issue:
—
dnsmasq isn't starting on okd-scos in the bootstrap VM

logs should it failing with "Operation not permitted"

https://github.com/openshift/installer/pull/7489

Story HOSTEDCP-1003: Make sure AWS specific conditions are only set if platform is AWS

View the Description View the linked PRs

The new test introduced by https://issues.redhat.com/browse/HOSTEDCP-960 fails for platforms other than AWS because some AWS specific conditions like `ValidAWSIdentityProvider` are always set regardless of the platform.

https://github.com/openshift/hypershift/pull/2604

Bug OCPBUGS-12581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-operator/pull/17

Bug OCPBUGS-16203: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2045

Bug OCPBUGS-20258: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1561

Bug OCPBUGS-20800: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/986

Bug OCPBUGS-5816: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/230

Bug OCPBUGS-14352: E2e tests fails because OpenShift Pipelines operator could not be found

View the Description View the linked PRs

E2e tests fails because OpenShift Pipelines operator could not be found.

https://search.ci.openshift.org/?search=AssertionError&maxAge=336h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/console/pull/12872

Bug OCPBUGS-14964: [CI Watcher] monitoring.scenario.ts tests failing

View the Description View the linked PRs

Description of problem:

The following test case is failing: 
Error: Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL. exception Error: Timeout - Async callback was not invoked within timeout specified by jasmine.DEFAULT_TIMEOUT_INTERVAL.

Tests scenario is failing with an 85% failure rate:

https://search.ci.openshift.org/?search=Alertmanager&maxAge=48h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_console/12892/pull-ci-openshift-console-master-e2e-gcp-console/1668916100596764672
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_console/12892/pull-ci-openshift-console-master-e2e-gcp-console/1668916100596764672/artifacts/e2e-gcp-console/test/artifacts/gui_test_screenshots/c8b0a6b0614b41eee9ea123ffe9a3bea.png

https://github.com/openshift/console/pull/12902

Bug OCPBUGS-21581: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/129

Bug OCPBUGS-22379: [HyperShift] Runtime zero namespaces are not excluded from pod security in guest cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21776~~. The following is the description of the original issue:
—
Description of problem: runtime zero namespaces ("default", "kube-system", "kube-public") are not excluded from pod security admission in hypershift guest cluster.
In OCP, these runtime zero namespaces are excluded from PSA.

How reproducible: Always

Steps to Reproduce:

1. Install a fresh 4.14 hypershift cluster
2. Check the labels under default, kube-system, kube-public namespaces
3. Try to change the PSA value on these namespaces in hypershift guest cluster and the values are getting updated.

Actual results:

$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
  name: default
...
$ oc label ns default pod-security.kubernetes.io/enforce=restricted --overwrite --kubeconfig=guest.kubeconfig
namespace/default labeled
$ oc get ns default -oyaml --kubeconfig=guest.kubeconfig
...
  labels:
    kubernetes.io/metadata.name: default
    pod-security.kubernetes.io/enforce: restricted
  name: default

Expected results:

Runtime zero namespaces ("default", "kube-system", "kube-public") are excluded from pod security admission

Additional info:

kube-system ns is excluded from PSA in guest cluster but when try to update security.openshift.io/scc.podSecurityLabelSync value with true/false, it is not updated where as in management cluster podSecurityLabelSync value will get updated.

https://github.com/openshift/hypershift/pull/3131

Bug OCPBUGS-25387: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2202

Bug MGMT-13643: [Staging] No size limitation for additional certificate

View the Description View the linked PRs

Description of the problem:

No limitation for Additional certificates UI field

How reproducible:

100%

Steps to reproduce:

1. create a cluster

2. On add host select 'Configure cluster-wide trusted certificates'

3. On Additional certificates, paste a big string

4. Generate Discovery ISO

Actual results:

UI send it to the BE

Expected results:

There should be a limitation on certificate field

https://github.com/openshift/assisted-service/pull/5226

Bug OCPBUGS-8216: oc-mirror print log: unable to parse reference oci://mno/redhat-operator-index:v4.12

View the Description View the linked PRs

Description of problem:

When use the command `oc-mirror --config config-oci-target.yaml  docker://localhost:5000  --use-oci-feature  --dest-use-http  --dest-skip-tls` , the command exit with code 0, but print log like : unable to parse reference oci://mno/redhat-operator-index:v4.12: lstat /mno: no such file or directory.

Version-Release number of selected component (if applicable):

oc-mirror version 
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.13.0-202303011628.p0.g2e3885b.assembly.stream-2e3885b", GitCommit:"2e3885b469ee7d895f25833b04fd609955a2a9f6", GitTreeState:"clean", BuildDate:"2023-03-01T16:49:12Z", GoVersion:"go1.19.2", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1. with imagesetconfig like : 
cat config-oci-target.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /home/ocmirrortest/0302/60597
mirror:
  operators:
  - catalog: oci:///home/ocmirrortest/noo/redhat-operator-index
    targetCatalog: mno/redhat-operator-index
    targetTag: v4.12
    packages:
    - name: aws-load-balancer-operator
`oc-mirror --config config-oci-target.yaml  docker://localhost:5000  --use-oci-feature  --dest-use-http  --dest-skip-tls`

Actual results:

1. the command exit with code 0, but print strange logs like:
sha256:95c45fae0ca9e9bee0fa2c13652634e726d8133e4e3009b363fcae6814b3461d localhost:5000/albo/aws-load-balancer-rhel8-operator:95c45f
sha256:ab38b37c14f7f0897e09a18eca4a232a6c102b76e9283e401baed832852290b5 localhost:5000/albo/aws-load-balancer-rhel8-operator:ab38b3
info: Mirroring completed in 43.87s (28.5MB/s)
Rendering catalog image "localhost:5000/mno/redhat-operator-index:v4.12" with file-based catalog 
Writing image mapping to oc-mirror-workspace/results-1677743154/mapping.txt
Writing CatalogSource manifests to oc-mirror-workspace/results-1677743154
Writing ICSP manifests to oc-mirror-workspace/results-1677743154
unable to parse reference oci://mno/redhat-operator-index:v4.12: lstat /mno: no such file or directory

Expected results:

no such log  .

https://github.com/openshift/oc-mirror/pull/579

Bug MGMT-14283: [Staging] [BE] - ignore validation API - API not accepting "all" to ignore all validation IDs

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.18.0 - Trying to set all validation IDs to be ignored with:

curl -X 'PUT' 'https://api.stage.openshift.com/api/assisted-install/v2/clusters/26a69b99-06a3-441b-be40-73cadbac6b6a/ignored-validations'   --header "Authorization: Bearer $(ocm token)"   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "host-validation-ids": "[]",                          
  "cluster-validation-ids": "[\"all\"]"       
}'

Getting this response:

 {"code":"400","href":"","id":400,"kind":"Error","reason":"cannot proceed due to the following errors: Validation ID 'all' is not a known cluster validation"}

How reproducible:

100%

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:
All ignorable validations should added to ignore list

https://github.com/openshift/assisted-service/pull/5117

Bug MGMT-14881: Discovery step won't pass the "Host Discovery" step if there are not disks in the hosts without any proper message

View the Description View the linked PRs

Description of the problem:

Creating a host without any disks will cause the following error log message without any indicative error message displayed to the user.

In this case the status remains Discovering and the user cannot know what the issue is.

Log from the service:

time="2023-06-07T12:36:09Z" level=error msg="failed to create new validation context for host e0b465cc-e91f-4ca6-9594-27052a9a6f28" func="github.com/openshift/assisted-service/internal/host.(*Manager).IsValidMasterCandidate" file="/assisted-service/internal/host/host.go:1280" error="Inventory is not valid" pkg=cluster-state

Example inventory:

{
  "bmc_address": "0.0.0.0",
  "bmc_v6address": ":: /0",
  "boot": {
    "current_boot_mode": "uefi"
  },
  "cpu": {
    "architecture": "x86_64",
    "count": 8,
    "flags": [
      "fpu",
      "vme",
      "de",
      "pse",
      "tsc",
      "msr",
      "pae",
      "mce",
      "cx8",
      "apic",
      "sep",
      "mtrr",
      "pge",
      "mca",
      "cmov",
      "pat",
      "pse36",
      "clflush",
      "mmx",
      "fxsr",
      "sse",
      "sse2",
      "ht",
      "syscall",
      "nx",
      "mmxext",
      "fxsr_opt",
      "pdpe1gb",
      "rdtscp",
      "lm",
      "rep_good",
      "nopl",
      "cpuid",
      "extd_apicid",
      "tsc_known_freq",
      "pni",
      "pclmulqdq",
      "ssse3",
      "fma",
      "cx16",
      "pcid",
      "sse4_1",
      "sse4_2",
      "x2apic",
      "movbe",
      "popcnt",
      "tsc_deadline_timer",
      "aes",
      "xsave",
      "avx",
      "f16c",
      "rdrand",
      "hypervisor",
      "lahf_lm",
      "cmp_legacy",
      "cr8_legacy",
      "abm",
      "sse4a",
      "misalignsse",
      "3dnowprefetch",
      "osvw",
      "topoext",
      "perfctr_core",
      "ssbd",
      "ibrs",
      "ibpb",
      "stibp",
      "vmmcall",
      "fsgsbase",
      "tsc_adjust",
      "bmi1",
      "avx2",
      "smep",
      "bmi2",
      "rdseed",
      "adx",
      "smap",
      "clflushopt",
      "clwb",
      "sha_ni",
      "xsaveopt",
      "xsavec",
      "xgetbv1",
      "xsaves",
      "clzero",
      "xsaveerptr",
      "wbnoinvd",
      "arat",
      "umip",
      "vaes",
      "vpclmulqdq",
      "rdpid",
      "arch_capabilities"
    ],
    "frequency": 2545.214,
    "model_name": "AMD EPYC 7J13 64-Core Processor"
  },
  "disks": [],
  "gpus": [
    {
      "address": "0000: 00: 02.0"
    }
  ],
  "hostname": "02-00-17-01-2c-cf",
  "interfaces": [
    {
      "flags": [
        "up",
        "broadcast",
        "multicast"
      ],
      "has_carrier": true,
      "ipv4_addresses": [
        "10.0.28.205/20"
      ],
      "ipv6_addresses": [],
      "mac_address": "02: 00: 17: 01: 2c: cf",
      "mtu": 9000,
      "name": "ens3",
      "product": "0x101e",
      "speed_mbps": 50000,
      "type": "physical",
      "vendor": "0x15b3"
    }
  ],
  "memory": {
    "physical_bytes": 17179869184,
    "physical_bytes_method": "dmidecode",
    "usable_bytes": 16765730816
  },
  "routes": [
    {
      "destination": "0.0.0.0",
      "family": 2,
      "gateway": "10.0.16.1",
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": "10.0.16.0",
      "family": 2,
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": "10.88.0.0",
      "family": 2,
      "interface": "cni-podman0"
    },
    {
      "destination": "169.254.0.0",
      "family": 2,
      "interface": "ens3",
      "metric": 100
    },
    {
      "destination": ":: 1",
      "family": 10,
      "interface": "lo",
      "metric": 256
    },
    {
      "destination": "fe80:: ",
      "family": 10,
      "interface": "cni-podman0",
      "metric": 256
    },
    {
      "destination": "fe80:: ",
      "family": 10,
      "interface": "ens3",
      "metric": 1024
    }
  ],
  "system_vendor": {
    "manufacturer": "QEMU",
    "product_name": "Standard PC (i440FX + PIIX, 1996)",
    "virtual": true
  },
  "tpm_version": "none"
}

Steps to reproduce:

1. Register a new cluster

2. Generate image and deploy nodes without disks

Actual results:

Expected results:

Fail validation if the inventory is invalid.

https://github.com/openshift/assisted-service/pull/5430

Bug OCPBUGS-15044: [performance] Checking IRQBalance settings Verify GloballyDisableIrqLoadBalancing Spec field [test_id:36150] Verify that IRQ load balancing is enabled/disabled correctly

View the Description View the linked PRs

Description of problem:

[performance] Checking IRQBalance settings Verify GloballyDisableIrqLoadBalancing Spec field [test_id:36150] Verify that IRQ load balancing is enabled/disabled correctly

[rfe_id:27368][performance]
 Pre boot tuning adjusted by tuned  
[test_id:35363][crit:high][vendor:cnf-qe@redhat.com][level:acceptance] 
stalld daemon is running on the host

[rfe_id:27363][performance]
 CPU Management Verification of cpu manager functionality Verify CPU 
usage by stress PODs [test_id:27492] Guaranteed POD should work on 
isolated cpu

tests fails often in 4.13 and 4.14 upstream CI jobs

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-telco5g-cnftests/1669344976506458112/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

Version-Release number of selected component (if applicable):

4.14 4.13

How reproducible:

CI job

Steps to Reproduce:

Ci job

Actual results:

failures

Expected results:

pass

Additional info:

https://snapshots.raintank.io/dashboard/snapshot/6sZ1uBR5P1O1gknyxebPQPtEo7RVEu0C
history and pass/fail ratio

https://github.com/openshift/cluster-node-tuning-operator/pull/768

Bug OCPBUGS-19955: when disabling ipsec, ds pods are deleted

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19918~~. The following is the description of the original issue:
—
Description of problem:

Issue was found when analyzing  bug https://issues.redhat.com/browse/OCPBUGS-19817

Version-Release number of selected component (if applicable):

4.15.0-0.ci-2023-09-25-165744

How reproducible:

everytime

Steps to Reproduce:

The cluster is ipsec cluster and enabled NS extension and ipsec service.
1.  enable e-w ipsec & wait for cluster to settle
2.  disable ipsec & wait for cluster to settle

you'll observer ipsec pods are deleted

Actual results:

no pods

Expected results:

pods should stay
see https://github.com/openshift/cluster-network-operator/blob/master/pkg/network/ovn_kubernetes.go#L314
	// If IPsec is enabled for the first time, we start the daemonset. If it is
	// disabled after that, we do not stop the daemonset but only stop IPsec.
	//
	// TODO: We need to do this as, by default, we maintain IPsec state on the
	// node in order to maintain encrypted connectivity in the case of upgrades.
	// If we only unrender the IPsec daemonset, we will be unable to cleanup
	// the IPsec state on the node and the traffic will continue to be
	// encrypted.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2045

Bug OCPBUGS-5872: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3581

Bug OCPBUGS-7910: OperatorHub UI shows Operator Channels in random order for FBC Catalogs

View the Description View the linked PRs

Description of problem:

Any FBC enabled OLM Catalog displays the Channels in a random order.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a catalog source for icr.io/cpopen/ibm-operator-catalog:latest
2. Navigate to OperatorHub
3. Click on the `ibm-mq` operator
4. Click on the Install button.

Actual results:

The list of channels is in random order. The order changes with each page refresh.

Expected results:

The list of channels should be in lexicographical ascending order as it was for SQLITE based catalogs.

Additional info:

See related operator-registry upstream issue:
https://github.com/operator-framework/operator-registry/issues/1069#top

Note:  I think both `operator-registry` and the OperatorHub should provide deterministic sorting of these channels.

https://github.com/openshift/operator-framework-olm/pull/476

Bug OCPBUGS-11102: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-11773: create hosted cluster failed with aws s3 access issue

View the Description View the linked PRs

Description of problem:

with new s3 bucket, hc failed with condition :
- lastTransitionTime: “2023-04-13T14:17:11Z”
   message: ‘failed to upload /.well-known/openid-configuration to the heli-hypershift-demo-oidc-2
    s3 bucket: aws returned an error: AccessControlListNotSupported’
   observedGeneration: 3
   reason: OIDCConfigurationInvalid
   status: “False”
   type: ValidOIDCConfiguration

Version-Release number of selected component (if applicable):

How reproducible:

1 create s3 bucket 
$ aws s3api create-bucket --create-bucket-configuration  LocationConstraint=us-east-2 --region=us-east-2 --bucket heli-hypershift-demo-oidc-2
{
  "Location": "http://heli-hypershift-demo-oidc-2.s3.amazonaws.com/"
}
[cloud-user@heli-rhel-8 ~]$ aws s3api delete-public-access-block --bucket heli-hypershift-demo-oidc-2

2 install HO and create a hc on aws us-west-2
3. hc failed with condition:
- lastTransitionTime: “2023-04-13T14:17:11Z”    message: ‘failed to upload /.well-known/openid-configuration to the heli-hypershift-demo-oidc-2     s3 bucket: aws returned an error: AccessControlListNotSupported’    observedGeneration: 3    reason: OIDCConfigurationInvalid    status: “False”    type: ValidOIDCConfiguration

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

create a hc successfully

Additional info:

https://github.com/openshift/hypershift/pull/2423

Bug OCPBUGS-17191: Missing namespace label for several CMO alerts

View the Description View the linked PRs

Description of problem:

These alerts fire without a namespace label:
* KubeStateMetricsListErrors
* KubeStateMetricsWatchErrors
* KubeletPlegDurationHigh
* KubeletTooManyPods
* KubeNodeReadinessFlapping
* KubeletPodStartUpLatencyHigh

Alerting rules without a namespace label make it harder for cluster admins to route the alerts.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Check the definitions of the said alerting rules.

Actual results:

The PromQL expressions aggregate away the namespace label and there's no static namespace label either.

Expected results:

Static namespace label in the rule definition.

Additional info:

https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide

Alerts SHOULD include a namespace label indicating the source of the alert. Many alerts will include this by virtue of the fact that their PromQL expressions result in a namespace label. Others may require a static namespace label

https://github.com/openshift/cluster-monitoring-operator/pull/2058

Bug OCPBUGS-19850: Outgoing traffic throughs EgressRouter is broken

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18003~~. The following is the description of the original issue:
—
Description of problem:

Found auto case OCP-42340 failed in ci job which version is 4.14.0-ec.4 and then reproduced issue in 4.14.0-0.nightly-2023-08-22-221456

Version-Release number of selected component (if applicable):

4.14.0-ec.4 4.14.0-0.nightly-2023-08-22-221456

How reproducible:

Always

Steps to Reproduce:

1. Deploy egressrouter on baremetal with 
{
    "kind": "List",
    "apiVersion": "v1",
    "metadata": {},
    "items": [
        {
            "apiVersion": "network.operator.openshift.io/v1",
            "kind": "EgressRouter",
            "metadata": {
                "name": "egressrouter-42430",
                "namespace": "e2e-test-networking-egressrouter-l4xgx"
            },
            "spec": {
                "addresses": [
                    {
                        "gateway": "192.168.111.1",
                        "ip": "192.168.111.55/24"
                    }
                ],
                "mode": "Redirect",
                "networkInterface": {
                    "macvlan": {
                        "mode": "Bridge"
                    }
                },
                "redirect": {
                    "redirectRules": [
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 80,
                            "protocol": "TCP"
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8080,
                            "protocol": "TCP",
                            "targetPort": 80
                        },
                        {
                            "destinationIP": "142.250.188.206",
                            "port": 8888,
                            "protocol": "TCP",
                            "targetPort": 80
                        }
                    ]
                }
            }
        }
    ]
}

 % oc get pods -n  e2e-test-networking-egressrouter-l4xgx -o wide
NAME                                           READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES
egress-router-cni-deployment-c4bff88cf-skv9j   1/1     Running   0          69m   10.131.0.26   worker-0   <none>           <none>

2. Create service which point to egressrouter
% oc get svc -n e2e-test-networking-egressrouter-l4xgx -o yaml  
apiVersion: v1
items:
- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2023-08-23T05:58:30Z"
    name: ovn-egressrouter-multidst-svc
    namespace: e2e-test-networking-egressrouter-l4xgx
    resourceVersion: "50383"
    uid: 07341ff1-6df3-40a6-b27e-59102d56e9c1
  spec:
    clusterIP: 172.30.10.103
    clusterIPs:
    - 172.30.10.103
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: con1
      port: 80
      protocol: TCP
      targetPort: 80
    - name: con2
      port: 5000
      protocol: TCP
      targetPort: 8080
    - name: con3
      port: 6000
      protocol: TCP
      targetPort: 8888
    selector:
      app: egress-router-cni
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}
kind: List
metadata:
  resourceVersion: ""

  3. create a test pod to access the service or curl the egressrouter IP:port directly 
oc rsh -n e2e-test-networking-egressrouter-l4xgx hello-pod1                                  
~ $ curl 172.30.10.103:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
~ $ curl 10.131.0.26:80 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms
 $ curl 10.131.0.26:8080 --connect-timeout 5
curl: (28) Connection timeout after 5001 ms

Actual results:

  connection failed

Expected results:

  connection succeed

Additional info:
Note, the issue didn't exist in 4.13. It passed in 4.13 latest nightly build 4.13.0-0.nightly-2023-08-11-101506

08-23 15:26:16.955  passed: (1m3s) 2023-08-23T07:26:07 "[sig-networking] SDN ConnectedOnly-Author:huirwang-High-42340-Egress router redirect mode with multiple destinations."

https://github.com/openshift/egress-router-cni/pull/78

Bug OCPBUGS-23569: (backport) HostedControlPlane Nodeport service is not opened in a dualstack deployment

View the Description View the linked PRs

Description of problem:

After extensive debugging on HostedControlPlanes in dual stack mode, we have discovered that QE department has issues in dual stack environments. 

In Hypershift/HostedControlPlane, we have an HAProxy in the dataplane (worker nodes of the HostedCluster). This HAProxy is unable to redirect calls to the KubeApiServer in the ControlPlane, attempts to connect using both protocols, IPv6 initially and then IPv4. The issue is that the HostedCluster is exposing services in NodePort mode, and it seems that the masterNodes of the management cluster are not opening these NodePorts in IPv6, only in IPv4.
Even though the master node shows this trace with netstat:

tcp6 9 0 :::32272 :::* LISTEN 6086/ovnkube

It seems that it is only opening in IPv4, as it is not possible to connect to the API via IPv6 even locally. This only happens with dual stack; in the case of IPv4 and v6, it works correctly in single-stack mode.

Version-Release number of selected component (if applicable):

4.14.X
4.15.X

How reproducible:

100%

Steps to Reproduce:

1. Deploy an Openshift management cluster in dual stack mode
2. Deploy MCE 2.4
3. Deploy a HostedCluster in dual stack mode

Actual results:

- Many pods stuck in ContainerCreating state
- The HostedCluster cannot be deployed, many COs blocked and clusterversion also stuck

Expected results:

HostedCluster deployment done

Additional info:

To reproduce the issue you could contact @jparrill or @Liangquan Li in slack, this will make things easier for the environment creation.

https://github.com/openshift/hypershift/pull/3224

Bug OCPBUGS-13190: Ingress Operator is needlessly reverting default values in Internal Services

View the Description View the linked PRs

Description of problem:

Ingress operator is constantly reverting Internal Services when it detects a service change that are default values.

Version-Release number of selected component (if applicable):

4.13, 4.14

How reproducible:

100%

Steps to Reproduce:

1. Create an ingress controller
2. Watch ingress operator logs for excess updates "updated internal service"
[I'll provide a more specific reproducer if needed]

Actual results:

Excess:
2023-05-04T02:08:02.331Z INFO operator.ingress_controller ingress/internal_service.go:44 updated internal service ...

Expected results:

No updates

Additional info:

The diff looks like:
2023-05-05T15:12:06.668Z    INFO    operator.ingress_controller    ingress/internal_service.go:44    updated internal service    {"namespace": "openshift-ingress", "name": "router-internal-default", "diff": "  &v1.Service{
    TypeMeta:   {},
    ObjectMeta: {Name: \"router-internal-default\", Namespace: \"openshift-ingress\", UID: \"815f1499-a4d4-4cb8-9a5b-9905580e0ffd\", ResourceVersion: \"8031\", ...},
    Spec: v1.ServiceSpec{
      Ports:                    {{Name: \"http\", Protocol: \"TCP\", Port: 80, TargetPort: {Type: 1, StrVal: \"http\"}, ...}, {Name: \"https\", Protocol: \"TCP\", Port: 443, TargetPort: {Type: 1, StrVal: \"https\"}, ...}, {Name: \"metrics\", Protocol: \"TCP\", Port: 1936, TargetPort: {Type: 1, StrVal: \"metrics\"}, ...}},
      Selector:                 {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"default\"},
      ClusterIP:                \"172.30.56.107\",
-     ClusterIPs:               []string{\"172.30.56.107\"},
+     ClusterIPs:               nil,
      Type:                     \"ClusterIP\",
      ExternalIPs:              nil,
-     SessionAffinity:          \"None\",
+     SessionAffinity:          \"\",
      LoadBalancerIP:           \"\",
      LoadBalancerSourceRanges: nil,
      ... // 3 identical fields
      PublishNotReadyAddresses:      false,
      SessionAffinityConfig:         nil,
-     IPFamilies:                    []v1.IPFamily{\"IPv4\"},
+     IPFamilies:                    nil,
-     IPFamilyPolicy:                &\"SingleStack\",
+     IPFamilyPolicy:                nil,
      AllocateLoadBalancerNodePorts: nil,
      LoadBalancerClass:             nil,
-     InternalTrafficPolicy:         &\"Cluster\",
+     InternalTrafficPolicy:         nil,
    },
    Status: {},
  }
"}

Messing around with unit testing, it looks like internalServiceChanged triggers true when spec.IPFamilies, spec.IPFamilyPolicy, and spec.InternalTrafficPolicy are set to the default values that you see in the diff above.

Ingress operator then resets back to nil, then the API server sets them to their defaults, and this process repeats.

internalServiceChanged should either ignore, or explicitly set these values.

https://github.com/openshift/cluster-ingress-operator/pull/927

Bug OCPBUGS-3542: Add bootstrapExternalStaticDNS in installer to avoid using bootstrapExternalStaticGateway as DNS

View the Description View the linked PRs

Description of problem:

The bootstrapExternalStaticGateway IP uses as DNS for bootstrap node

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. Deploy baremetal IPI using static boostrap IP.
2. It consumes bootstrapExternalStaticGateway as DNS for the bootstrap node.
3.

Actual results:

Sometimes bootstrapExternalStaticGateway cannot act as DNS

Expected results:

DNS resolution should work on bootstrap if it uses static IP

Additional info:

https://github.com/openshift/installer/pull/6585

Bug OCPBUGS-10836: All projects options shows as undefined after selection in Dev perspective Pipelines page

View the Description View the linked PRs

Description of problem:

As a user when I select the All projects option from the Projects dropdown in the Dev perspective Pipelines pages then the selected option says as undefined.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Navigate to Pipelines page in the Dev perspective
2. Select the All projects option from the Projects dropdown

Actual results:

Selected option shows as undefined and all Projects list is not shown

Expected results:

Selected option should be All projects and open All projects list page

Additional info:

https://github.com/openshift/console/pull/12676

Bug OCPBUGS-12597: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/138

Bug OCPBUGS-13017: aws-ebs-csi-driver-controller-sa ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

aws-ebs-csi-driver-controller-ca ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/219

Bug OCPBUGS-16531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/35

Bug OCPBUGS-26512: CCO reports wrong credentials mode in metrics

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26488. The following is the description of the original issue:
—
Description of problem:

CCO reports credsremoved mode in metrics when the cluster is actually in the default mode. 
See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/47349/rehearse-47349-pull-ci-openshift-cloud-credential-operator-release-4.16-e2e-aws-qe/1744240905512030208 (OCP-31768).

Version-Release number of selected component (if applicable):

4.16

How reproducible:

Always.

Steps to Reproduce:

1. Creates an AWS cluster with CCO in the default mode (ends up in mint)
2. Get the value of the cco_credentials_mode metric

Actual results:

credsremoved

Expected results:

mint

Root cause:

The controller-runtime client used in metrics calculator (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L77) is unable to GET the root credentials Secret (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/operator/metrics/metrics.go#L184) since it is backed by a cache which only contains target Secrets requested by other operators (https://github.com/openshift/cloud-credential-operator/blob/77a68ad01e75162bfa04097b22f80d305c192439/pkg/cmd/operator/cmd.go#L164-L168).

https://github.com/openshift/cloud-credential-operator/pull/647

Bug OCPBUGS-5940: [CI Watcher]: logs in as 'test' user via htpasswd identity provider: Auth test logs in as 'test' user via htpasswd identity provider

View the Description View the linked PRs

Description of problem:

Tests Failed.expand_lesslogs in as 'test' user via htpasswd identity provider: Auth test logs in as 'test' user via htpasswd identity provider

CI-search
Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12697

Bug OCPBUGS-14010: Increase health probe for openshift apiserver

View the Description View the linked PRs

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1684775113222139?thread_ts=1684769886.464419&cid=CB48XQ4KZ

Bug OCPBUGS-23210: [4.14.z] [IBM ROKS] cluster-storage-operator does not set upgradeable=True

View the Description View the linked PRs

Description of problem:

There is a problem with IBM ROKS (managed service) running 4.14+

cluster-storage-operator never sets the upgradeable=True condition, so it shows up as Unknown:

  - lastTransitionTime: "2023-11-08T19:07:01Z"
    reason: NoData
    status: Unknown
    type: Upgradeable

This is a regression from 4.13.

In 4.13, pkg/operator/snapshotcrd/controller.go was the one that set `upgradeable: True`

    upgradeable := operatorapi.OperatorCondition{
        Type:   conditionsPrefix + operatorapi.OperatorStatusTypeUpgradeable,
        Status: operatorapi.ConditionTrue,
    }

In the 4.13 bundle from IBM ROKS, these two conditions are set in cluster-scoped-resources/operator.openshift.io/storages/cluster.yaml

  - lastTransitionTime: "2023-11-08T14:22:21Z"
    status: "True"
    type: SnapshotCRDControllerUpgradeable
  - lastTransitionTime: "2023-11-08T14:22:21Z"
    reason: AsExpected
    status: "False"
    type: SnapshotCRDControllerDegraded

So the SnapshotCRDController is running and sets `upgradeable: True` on 4.13.

But in the 4.14 bundle, SnapshotCRDController no longer exists.

https://github.com/openshift/cluster-storage-operator/pull/385/commits/fa9af3aad65b9d0e9c618453825e4defeaad59ac

So in 4.14+ it's pkg/operator/defaultstorageclass/controller.go that should set the condition

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/defaultstorageclass/controller.go#L97-L100

But that only happens if `syncErr == unsupportedPlatformError`...
and not if `if syncErr == supportedByCSIError` like the case with the IBM VPC driver.

  - lastTransitionTime: "2023-11-08T14:22:23Z"
    message: 'DefaultStorageClassControllerAvailable: StorageClass provided by supplied
      CSI Driver instead of the cluster-storage-operator'
    reason: AsExpected
    status: "True"
    type: Available

So what controller will set `upgradeable: True` for IBM VPC?
IBM VPC uses this StatusFilter function for ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/csioperatorclient/ibm-vpc-block.go#L17-L27

ROKS and AzureStack are the only deployments using a StatusFilter function...
So shouldRunController returns false here because the platform is ROKS:

https://github.com/openshift/cluster-storage-operator/blob/dbb1514dbf9923c56a4a198374cc59e45f9bc0cc/pkg/operator/csidriveroperator/driver_starter.go#L347-L349

Which means there is no controller to set `upgradeable: True`

Version-Release number of selected component (if applicable):

4.14.0+

How reproducible:

Always

Steps to Reproduce:

1. Install 4.14 via IBM ROKS
2. Check status conditions in cluster-scoped-resources/config.openshift.io/clusteroperators/storage.yaml

Actual results:

upgradeable=Unknown

Expected results:

upgradeable=True

Additional info:

4.13 IBM ROKS must-gather:
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather-4.13.tar.gz

4.14 IBM ROKS must-gather: 
https://github.com/Joseph-Goergen/ibm-roks-toolkit/releases/download/test/must-gather.tar.gz

https://github.com/openshift/cluster-storage-operator/pull/419

Bug OCPBUGS-24423: Searching for items in quick search is confusing

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18371~~. The following is the description of the original issue:
—
Description of problem:

In the quick search, if you search for word net you can see two options with the same name and description, one is for the source to image option and the other is for the sample option
but there is no way to differentiate in quick search

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Go to topology or Add page and select quick search
2. Search for net or node you will see confusing options
3.

Actual results:

Similar options with no differentiation in the quick search menu

Expected results:

Some way to differentiate different options in the quick search menu

Additional info:

https://github.com/openshift/console/pull/13412

Bug OCPBUGS-9435: Hard coded region references remain in installer

View the Description View the linked PRs

PRs were previously merged to add SC2S support via AWS SDK here:

However, further updates to add support for SC2S region (us-isob-east-1) and new TC2S region (us-iso-west-1) are still required.

There are still hard coded references to the old regions in the follow locations.

https://github.com/openshift/installer/pull/7184

Spike OCPCLOUD-2016: CPMS: Investigate a way to surface the desired vs current ProviderSpec diff

View the Description View the linked PRs

User Story

As a user I want to see what differs between the Machine's (current) ProviderSpec and the Control Plane Machine Set (desired) ProviderSpec so that I can understand why the CPMSO is replacing my control plane machine.

Background

Work spawn out of discussions in https://redhat-internal.slack.com/archives/CCX9DB894/p1678820665803259 and https://redhat-internal.slack.com/archives/C04UB95G802

Believe we are already logging this, would be good to emit either an event or the diff into the status, whoever takes this card should investigate the best way of surfacing this.

Outcome:

Decision on event/status/both
If status, API design scoped out
Cards written for implementation

Steps

PR
update tests

Stakeholders

<Who is interested in this/where did they request this>

Definition of Done

<Add items that need to be completed for this card>

Docs

<Add docs requirements for this card>

Testing

<Explain testing that will be added>

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/180

Bug OCPBUGS-15978: DNSReady is True even dns records failed to be published to public zone

View the Description View the linked PRs

Description of problem:

when checking the bug https://issues.redhat.com/browse/OCPBUGS-15976, found that the default ingresscontroller DNSReady is True even dns records failed to be published to public zone, the co/ingress doesn't report any error.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-05-191022

How reproducible:

100%

Steps to Reproduce:

1. install Azure cluster configured for manual mode with Azure Workload Identity 

2. check dnsrecords of default-wildcard
$ oc -n openshift-ingress-operator get dnsrecords default-wildcard -oyaml
<---snip--->
  - conditions:
    - lastTransitionTime: "2023-07-10T04:23:55Z"
      message: 'The DNS provider failed to ensure the record: failed to update dns ......
      reason: ProviderError
      status: "False"
      type: Published
    dnsZone:
      id: /subscriptions/xxxxx/resourceGroups/os4-common/providers/Microsoft.Network/dnszones/qe.azure.devcluster.openshift.com

3. Check ingresscontroller status
$ oc -n openshift-ingress-operator get ingresscontroller default -oyaml
<---snip--->
  - lastTransitionTime: "2023-07-10T04:23:55Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady

4. Check co/ingress status
$ oc get co/ingress
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.14.0-0.nightly-2023-07-05-191022   True        False         False      127m

Actual results:

1. DNSReady is True and message shows: The record is provisioned in all reported zones.
2. co/ingress doesn't report any error

Expected results:

DNSReady should be False since failed to publish to public zone

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/967

Bug OCPBUGS-17810: MCO API erroneously contains certificate dates typed/formatted as strings, violates API conformance

View the Description View the linked PRs

Description of problem:

The MCO's "Certificate Observability" CRD fields (introduced in MCO-607) are non-RFC3339 formatted strings and are unparseable as the API standard metav1.Time

For context, the MCO is currently migrating its API to openshift/api where it needs to comply with API standards, and if these strings are still present in the API when 4.14 ships, we will be unable to upgrade from the shipping version to the one where the API has migrated, so we need to adjust this now before it ships.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.Create a cluster
2.Observe ControllerConfig status.controllerCertificates
3.Observe MachineConfigPool status.certExpirys

Actual results:

Types are wrong, and strings are formatted thusly: 2033-08-12 01:47:54 +0000 UTC

Expected results:

ControllerConfig and MachineConfigPools do not contain certificate observability fields formatted as "2033-08-12 01:47:54 +0000 UTC".

Either contain certificate observability fields formatted as "2006-01-02T15:04:05Z07:00" or should not contain them at all.

Additional info:

If we ship 4.14 with these strings how they are, we will be stuck like that and unable to easily upgrade out of it (because the new MCO that regards the fields as metav1.Time will be unable to parse the old strings), e.g.

2023-08-15T05:03:40.989575279Z W0815 05:03:40.989527 1 reflector.go:533] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: failed to list *v1.MachineConfigPool: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:03:40.989575279Z E0815 05:03:40.989555 1 reflector.go:148] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: Failed to watch *v1.MachineConfigPool: failed to list *v1.MachineConfigPool: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:04:05.304139210Z W0815 05:04:05.304088 1 reflector.go:533] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: failed to list *v1.ControllerConfig: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T" 2023-08-15T05:04:05.304139210Z E0815 05:04:05.304121 1 reflector.go:148] github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:101: Failed to watch *v1.ControllerConfig: failed to list *v1.ControllerConfig: parsing time "2033-08-12 01:47:54 +0000 UTC" as "2006-01-02T15:04:05Z07:00": cannot parse " 01:47:54 +0000 UTC" as "T"

https://github.com/openshift/machine-config-operator/pull/3866

Task OSASINFRA-3063: Bump MAPO to CAPO 0.7

View the linked PRs

https://github.com/openshift/machine-api-provider-openstack/pull/72

Bug OCPBUGS-11719: Load balancers/ Ingress controller removal race condition

View the Description View the linked PRs

Description of problem:

According to the slack thread attached: Cluster uninstallation is stuck when load balancers are removed before ingress controllers. This can happen when the ingress controller removal fails and the control plane operator moves on to deleting load balancers without waiting.

Code ref https://github.com/openshift/hypershift/blob/248cea4daef9d8481c367f9ce5a5e0436e0e028a/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L1505-L1520

Version-Release number of selected component (if applicable):

4.12.z 4.13.z

How reproducible:

Whenever the load balancer is deleted before the ingress controller

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Load balancer deletion waits for the ingress controller deletion

Additional info:

Slack: https://redhat-internal.slack.com/archives/C04EUL1DRHC/p1681310121904539?thread_ts=1681216434.676009&cid=C04EUL1DRHC

https://github.com/openshift/hypershift/pull/2444

Bug OCPBUGS-9285: Documentation: Help Explain OpenShift Console List & Detail Resource Pages For Plugin Developers

View the Description View the linked PRs

The issue:

An interesting issue came up on #forum-ui-extensibility. There was an attempt to use extensions to nest a details page under a details page that contained a horizontal nav. This caused an issue with rendering the page content when a sub link was clicked – which caused confusion.

The why:

The reason this happened was the resource details page had a tab that contained a resource list page. This resource list page showed a number of items of CRs that when clicked would try to append their name onto the URL. This confused the navigation, thinking that this path must be another tab, so no tabs were selected and no content was visible. The goal was to reuse this longer path name as a details page of its own with its own horizontal nav. This issue is a conceptual misunderstanding of the way our list & details pages work in OpenShift Console.

List Pages are sometimes found via direct navigation links. List pages are almost all shown on the Search page, allowing a user to navigate to both existing nav items and other non-primary resources.

Details Pages are individual items found in the List Pages (a row). These are stand alone pages that show details of a singular CR and optionally can have tabs that list other resources – but they always transition to a fresh Details page instead of compounding on the currently visible one.

The ask:

If we could document this in a fashion that can help Plugin developers share the same UX that the rest of the Console does then we will have a more unified approach to UX within the Console and through any installed Plugins.

https://github.com/openshift/console/pull/13044

Bug OCPBUGS-15613: Default ingress check not working

View the Description View the linked PRs

Description of problem:

The chk_default_ingress.sh script for keepalived is not correctly matching the default ingress pod name anymore. The pod name in a recently deployed dev-scripts cluster is router-default-97fb6b94c-wfxfk which does not match our grep pattern of router-default-[[:xdigit:]]\\{10}-[[:alnum:]] {5}{}. The main issue seems to be that the first id is only 9 digits, not 10.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Unsure, but has been seen at least twice

Steps to Reproduce:

1. Deploy recent nightly build
2. Look at chk_default_ingress status
3.

Actual results:

Always failing, even on nodes with the default ingress pod

Expected results:

Passes on nodes with default ingress pod

Additional info:

https://github.com/openshift/machine-config-operator/pull/3775

Bug OCPBUGS-10032: upgrade test failure with "Cluster operator control-plane-machine-set is not available"

View the Description View the linked PRs

Description of problem:

test "operator conditions control-plane-machine-set" fails https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216
control-plane-machine-set operator is Unavailable, because it doesn't reconcile node events. If a node becomes ready later than the referencing Machine, Node update event will not trigger reconciliation.

Version-Release number of selected component (if applicable):

How reproducible:

depends on the sequence of Node vs Machine events

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

operator logs 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/pods/openshift-machine-api_control-plane-machine-set-operator-5d5848c465-g4q2p_control-plane-machine-set-operator.log

machines 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/machines.json

nodes 
https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1574/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-upgrade/1634410710559625216/artifacts/e2e-aws-ovn-upgrade/gather-extra/artifacts/nodes.json

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/177

Bug OCPBUGS-11123: "oc adm groups sync" is not working if multiple OCP groups point to same LDAP group

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-24665: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/65

Bug OCPBUGS-25749: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/308

Bug HOSTEDCP-947: Hosted etcd running out of space on PVC after scale test

View the Description View the linked PRs

After running several scale tests on a large cluster (252 workers), etcd ran out of space and became unavailable.

These tests consisted of running our node-density workload (Creates more than 50k pause pods) and cluster-density 4k several times (creates 4k namespaces with https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#cluster-density-variables).

The actions above leaded etcd peers to run out of free space in their 4GiB PVCs presenting the following error trace

{"level":"warn","ts":"2023-03-31T09:50:57.532Z","caller":"rafthttp/http.go:271","msg":"failed to save incoming database snapshot","local-member-id":"b14198cd7f0eebf1","remote-snapshot-sender-id":"a4e894c3f4af1379","incoming-snapshot-index ":19490191,"error":"write /var/lib/data/member/snap/tmp774311312: no space left on device"}

Etcd uses 4GiB PVCs to store its data, which seems to be insufficient for this scenario. In addition, unlike not-hypershift clusters we're not applying any periodic database defragmentation (this is done by cluster-etcd-operator) that could lead to a higher database size

The graph below represents the metrics etcd_mvcc_db_total_size_in_bytes and etcd_mvcc_db_total_size_in_use_in_byte

Bug OCPBUGS-11020: [Hypershift Guest] OperatorHub details page returns error

View the Description View the linked PRs

Description of problem:

Viewing OperatorHub details page will return error page

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-03-28-180259

How reproducible:

Always on Hypershift Guest cluster

Steps to Reproduce:

1. Visit OperatorHub details page via Administration -> Cluster Settings -> Configuration -> OperatorHub 
2.
3.

Actual results:

Cannot read properties of undefined (reading 'sources')

Expected results:

page can be loaded successfully

Additional info:

screenshot one: https://drive.google.com/file/d/12cgpChKYuen2v6DWvmMrir273wONo5oY/view?usp=share_link
screenshot two: https://drive.google.com/file/d/1vVsczu7ScIqznoKNsR8V0w4k9bF1xWhB/view?usp=share_link

https://github.com/openshift/console/pull/12702

Bug OCPBUGS-19430: MCO keeps attempting to pull baremetalRuntimeCfg image again and again

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18772~~. The following is the description of the original issue:
—
MCO installs resolve-prepender NetworkManager script on the nodes. In order to find out node details it needs to pull baremetalRuntimeCfgImage. However, this image needs to be pulled just the first time, in the followup attempts this script just verifies that this image is available.

This is not desirable in situations where mirror / quay are not available or having a temporary problem - these kind of issues should not prevent the node from starting kubelet. During certificate rotation testing I noticed that the node with a significant time skew won't start kubelet, as it tries to pull baremetalRuntimeCfgImage for kubelet to start - but the image is already on the nodes and it doesn't need refreshing.

https://github.com/openshift/machine-config-operator/pull/3925

Bug OCPBUGS-22980: [release-4.14] Remove collapsible toggle for conditional update risk details

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22930~~. The following is the description of the original issue:
—

Description of problem:

When a user selects a supported-but-not-recommended update target, it's currently rendered as a DropdownWithSwitch that is collapsed by default. That forces the user to perform an extra click to see the message explaining the risk they are considering accepting. We should remove the toggle and always expand that message, because understanding the risk is a critical part of deciding whether you accept it.

Version-Release number of selected component (if applicable):

Since console landed support for conditional update risks. Not a big enough deal to backport that whole way.

How reproducible:

Every time.

Steps to Reproduce:

~~OTA-520~~ explains how to create dummy data for testing the conditional update UX pre-merge and/or on nightly builds that are not part of the usual channels yet.

Actual results:

Expected results:

but without the down-v, because the text should not be collapsible.

https://github.com/openshift/console/pull/13308

Bug OCPBUGS-12614: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/57

Bug OCPBUGS-14875: Helm Chart installation form hangs on create if JSON-schema contains unknown value format

View the Description View the linked PRs

Description of problem:

If a JSON schema used in by a chart contains unknown value format (non-standard JSON Schema but valid in OpenAPI spec for example), Helm form view hangs on validation and stays in "submitting" state.

As per JSON Schema standard the "format" keyword should only take an advisory role (like an annotation) and should not affect validation.

https://json-schema.org/understanding-json-schema/reference/string.html#format

Version-Release number of selected component (if applicable):

Verified against 4.13, but probably applies to others.

How reproducible:

100%

Steps to Reproduce:

1. Go to Helm tab.
2. Click create in top right and select Repository
3. Paste following into YAML view and click Create:

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: 'https://raw.githubusercontent.com/tumido/helm-backstage/repo-multi-schema2'

4. Go to the Helm tab again (if redirected elsewhere)
5. Click create in top right and select Helm Release
6. In catalog filter select Chart repositories: Reproducer
7. Click on the single tile available (Backstage) and click Create
8. Switch to Form view
9. Leave default values and click Create
10. Stare at the always loading screen that never proceeds further.

Actual results:

And never finishes or displays any error in UI.

Expected results:

Unknown format should not result in rejected validation. JSON Schema standard says that formats should not be used for validation.

Additional info:

This is not a schema violation by itself since Helm itself is happy about it and doesn't complain. The same chart can be successfully deployed via the YAML view.

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-15310: Helm Chart installation modal "Documentation" field is always N/A

View the Description View the linked PRs

Description of problem:

The modal displayed when installing a Helm chart shows a Documentation link field. This field can't be ever populated with a value and is always N/A

Annotation for documentation URL doesn't exist in https://github.com/redhat-certification/chart-verifier/blob/main/docs/helm-chart-annotations.md#provider-annotations

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Go to Helm chart catalog
2. View any chart
3. See documentation = "N/A"

Actual results:

N/A

Expected results:

A way to populate the value

Additional info:

The value is consumed here: https://github.com/openshift/console/blob/2e8624014065d09ba40164221dd612d882f20395/frontend/packages/console-shared/src/components/catalog/details/CatalogDetailsPanel.tsx

But it is never extracted from a chart:
https://github.com/openshift/console/blob/2e8624014065d09ba40164221dd612d882f20395/frontend/packages/helm-plugin/src/catalog/utils/catalog-utils.tsx#L138

It is probably because no such annotation exists in chart certification requirements/recommendations:
https://github.com/redhat-certification/chart-verifier/blob/main/docs/helm-chart-annotations.md#provider-annotations

https://github.com/openshift/console/pull/13032

Bug OCPBUGS-18059: After the BMO split, proxy and CA information are no longer passed to BMO

View the linked PRs

https://github.com/openshift/cluster-baremetal-operator/pull/358

Story TRT-1097: Avoid failing on KubeMemoryOvercommit for single node

View the Description View the linked PRs

Testgrid for single-node-workers-upgrade-conformance shows that tests are failing due to the 'KubeMemoryOvercommit' alert.

We should avoid failing on this alert for single node environments assuming it's ok to overcommit memory on single node Openshift clusters.

Ref: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1687375398906129

https://github.com/openshift/origin/pull/28002

Bug OCPBUGS-24293: Metrics: ConsolePlugins must no longer needs to be grouped

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24203~~. The following is the description of the original issue:
—

Bug OCPBUGS-4963: nodeip-configuration not enabled for VSphere UPI

View the Description View the linked PRs

Description of problem:

After further discussion about https://issues.redhat.com/browse/RFE-3383 we have concluded that it needs to be addressed in 4.12 since OVNK will be default there. I'm opening this so we can backport the fix.

The fix for this is simply to alter the logic around enabling nodeip-configuration to handle the VSphere-unique case of platform type == "vsphere" and the VIP field is not populated.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3460

Story CCO-363: Azure pod identity webhook

View the Description View the linked PRs

Will require following

fork webhook
make part of build process + OCP build dockerfile receival
write CCO controller which deploys the webhook

https://azure.github.io/azure-workload-identity/docs/installation/mutating-admission-webhook.html

Background

We deploy the AWS STS pod identity webhook as a customer convenience for configuring their applications to utilize service account tokens minted by a cluster that supports STS. When you create a pod that references a service account, the webhook looks for annotations on that service account and if found, the webhook mutates the deployment in order to set environment variables + mounts the service account token on that deployment so that the pod has everything it needs to make an API client.
Our temporary access token (using TAT in place of STS because STS is AWS specific) enablement for (select) third party operators does not rely on the webhook and is instead using CCO to create a secret containing the variables based on the credentials requests. The service account token is also explicitly mounted for those operators. Pod identity webhooks were considered as an alternative to this approach but weren't chosen.
Basically, if we deploy this webhook it will be for customer convenience and will enable us to potentially use the Azure pod identity webhook in the future if we so chose. Note that AKS provides this webhook and other clouds like Google offer a webhook solution for configuring customer applications.
This is about providing parity with other solutions but not required for anything directly related to the product.
If we don't provide this Azure pod identity webhook method, customer would need to get the details via some other way like a secret or set explicitly as environment variables. With the webhook, you just annotate your service account.

https://github.com/openshift/cloud-credential-operator/pull/559

Bug OCPBUGS-14049: OpenShift on OpenStack: Password Rotation of OSP User still leads to unknown authentication failures in Keystone

View the Description View the linked PRs

Description of problem:

After all cluster operators have reconciled after the password rotation, we can still see authentication failures in keystone (attached screenshot of splunk query)

Version-Release number of selected component (if applicable):

Environment:
- OpenShift 4.12.10 on OpenStack 16
- The cluster is managed via RHACM, but password rotation shall be done via "regular"  OpenShift means.

How reproducible:

Rotated the OpenStack credentials according to the documentation [1]

[1] https://docs.openshift.com/container-platform/4.12/authentication/managing_cloud_provider_credentials/cco-mode-passthrough.html#manually-rotating-cloud-creds_cco-mode-passthrough

Additional info:

- we can't trace back where these authentication failures come from - they do disappear after a cluster upgrade (so when nodes are rebooted and all pods are restarted which indicates that there's still a component using the old credentials)
- The relevant technical integration points _seem_ to be working though (LBaaS, CSI, Machine API, Swift)

What is the business impact? Please also provide timeframe information.

- We cannot rely on splunk monitoring for authentication issues since it's currently constantly showing authentication errors - We cannot be entirely sure that everything works as expected since we don't know the component that doesn't seem to use the new credentials

Bug OCPBUGS-17568: Agent-based install process the container machine-config-controller will be oom

View the Description View the linked PRs

Description of problem:

Customer used Agent-based installer to install 4.13.8 on they CID env, but during install process, the bootstrap machine had oom issue, check sosreport find the init container had oom issue

NOTE: Issue is not see when testing with 4.13.6, per the customer

initContainers:

name: machine-config-controller
image: .Images.MachineConfigOperator
command: ["/usr/bin/machine-config-controller"]
args:
"bootstrap"
"--manifest-dir=/etc/mcc/bootstrap"
"--dest-dir=/etc/mcs/bootstrap"
"--pull-secret=/etc/mcc/bootstrap/machineconfigcontroller-pull-secret"
"--payload-version=.ReleaseVersion"
resources:
limits:
memory: 50Mi

we found the sosreport dmesg and crio logs had oom kill machine-config-controller container issue, the issue was cause by cgroup kill, so looks like the limit 50M is too small

The customer used a physical machine that had 100GB of memory

the customer had some network config in asstant install yaml file, maybe the issue is them had some nic config?

log files:
1. sosreport
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/b5501734-60be-4de4-adcf-da57e22cbb8e?usePresignedUrl=true

2. asstent installer yaml file
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/a32635cf-112d-49ed-828c-4501e95a0e7a?usePresignedUrl=true

3. bootstrap machine oom screenshot
https://attachments.access.redhat.com/hydra/rest/cases/03578865/attachments/eefe2e57-cd23-4abd-9e0b-dd45f20a34d2?usePresignedUrl=true

https://github.com/openshift/machine-config-operator/pull/3862

Bug OCPBUGS-20847: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/87

Bug OCPBUGS-12925: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/26

Bug OCPBUGS-15896: STS annotation is invalid

View the Description View the linked PRs

Description of problem:

when applying a CSV with the current label recommendation for STS, the following error occurs:

error creating csv ack-s3-controller.v1.0.3: ClusterServiceVersion.operators.coreos.com "ack-s3-controller.v1.0.3" is invalid: metadata.annotations: Invalid value: "operators.openshift.io/infrastructure-features/token-auth/aws": a qualified name must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my.name', or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]') with an optional DNS subdomain prefix and '/' (e.g. 'example.com/MyName')

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. create a CSV with an annotation "operators.openshift.io/infrastructure-features/token-auth/aws: `false`"
2. apply the CSV on cluster

Actual results:

fails with the above error

Expected results:

should not fail

Additional info:

https://github.com/openshift/console/pull/12980

Bug OCPBUGS-16374: Topology page is crashed

View the Description View the linked PRs

Description of problem:

The topology page is crashed

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Visit developer console
2. Topology view
3.

Actual results:

Error message:
TypeError
Description:
e is null
Component trace:
f@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~app/code-refs/actions~delete-revision~dev-console-add~dev-console-deployImage~dev-console-ed~cf101ec3-chunk-5018ae746e2320e4e737.min.js:26:14244
5363/t.a@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:177913
u@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:275718
8248/t.a<@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:475504
i@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:470135
withFallback()
5174/t.default@https://console-openshift-console.apps.cl2.cloud.local/static/dev-console-topology-chunk-492be609fb2f16849dfa.min.js:1:78258
s@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:237096
[...]
ne<@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1592411
r@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:36:125397
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:58042
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:60087
t@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:21:54647
re@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1592722
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:791129
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:1062384
s@https://console-openshift-console.apps.cl2.cloud.local/static/main-chunk-378881319405723c0627.min.js:1:613567
t.a@https://console-openshift-console.apps.cl2.cloud.local/static/vendors~main-chunk-12b31b866c0a4fea4c58.min.js:141:244663

Expected results:

No error should be there

Additional info:

Cloud Pak Operator is installed

https://github.com/openshift/console/pull/13093

Bug OCPBUGS-19849: [vsphere] dual-stack install fails nodes stuck in node.cloudprovider.kubernetes.io/uninitialized

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18641~~. The following is the description of the original issue:
—
Description of problem:

vSphere Dual-stack install fails in bootstrap.
All nodes are node.cloudprovider.kubernetes.io/uninitialized

cloud-controller-manager can't find the nodes?

I0906 15:05:22.922183       1 search.go:49] WhichVCandDCByNodeID called but nodeID is empty
E0906 15:05:22.922187       1 nodemanager.go:197] shakeOutNodeIDLookup failed. Err=nodeID is empty

Version-Release number of selected component (if applicable):

4.14.0-0.ci.test-2023-09-06-141839-ci-ln-98f4iqb-latest

How reproducible:

Always

Steps to Reproduce:

1. Install vSphere IPI with OVN Dual-stack

platform:
  vsphere:
    apiVIPs:
      - 192.168.134.3
      - fd65:a1a8:60ad:271c::200
    ingressVIPs:
      - 192.168.134.4
      - fd65:a1a8:60ad:271c::201
networking:
  networkType: OVNKubernetes
  machineNetwork:
  - cidr: 192.168.0.0/16
  - cidr: fd65:a1a8:60ad:271c::/64
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  - cidr: fd65:10:128::/56
    hostPrefix: 64
  serviceNetwork:
  - 172.30.0.0/16
  - fd65:172:16::/112

Actual results:

Install fails in bootstrap

Expected results:

Install succeeds

Additional info:

I0906 15:03:21.393629       1 search.go:69] WhichVCandDCByNodeID by UUID
I0906 15:03:21.393632       1 search.go:76] WhichVCandDCByNodeID nodeID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406797       1 search.go:208] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406816       1 search.go:210] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2, UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.406830       1 nodemanager.go:159] Discovered VM using normal UUID format
I0906 15:03:21.416168       1 nodemanager.go:268] Adding Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2
I0906 15:03:21.416218       1 nodemanager.go:438] Adding Internal IP: 192.168.134.60
I0906 15:03:21.416229       1 nodemanager.go:443] Adding External IP: 192.168.134.60
I0906 15:03:21.416244       1 nodemanager.go:349] Found node 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416266       1 nodemanager.go:351] Hostname: ci-ln-bllxr6t-c1627-5p7mq-master-2 UUID: 421b78c3-f8bb-970c-781b-76827306e89e
I0906 15:03:21.416278       1 instances.go:77] instances.NodeAddressesByProviderID() FOUND with 421b78c3-f8bb-970c-781b-76827306e89e
E0906 15:03:21.416326       1 node_controller.go:236] error syncing 'ci-ln-bllxr6t-c1627-5p7mq-master-2': failed to get node modifiers from cloud provider: provided node ip for node "ci-ln-bllxr6t-c1627-5p7mq-master-2" is not valid: failed to get node address from cloud provider that matches ip: fd65:a1a8:60ad:271c::70, requeuing
I0906 15:03:21.623573       1 instances.go:102] instances.InstanceID() CACHED with ci-ln-bllxr6t-c1627-5p7mq-master-1

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/283

Bug MON-3093: End-to-end tests don’t run with HyperShift clusters because they are single node.

View the Description View the linked PRs

While running the e2e test locally with Hypershift cluster from cluster-bot I noticed that it fails on step waiting for 2 prometheus instances.

“wait for prometheus-k8s: expected 2 Prometheus instances but got: 1: timed out waiting for the condition”

Since Hypershift clusters from cluster-bot are single worker node, it will always fail since we are checking it should be always 2 instances in main_test.go.

Ideally we need to check the infrastructureTopology field and adjust the test if the infrastructure is “SingleReplica”

https://github.com/openshift/cluster-monitoring-operator/pull/2060

Bug OCPBUGS-4194: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/650

Bug OCPBUGS-11958: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1970

Bug OCPBUGS-21473: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22734: [4.14.z] Add RHACS metrics to telemetry

View the Description View the linked PRs

Description of problem:

Add telemetry metrics for Red Hat Advanced Cluster Security (RHACS) to OpenShift Telemeter.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Look up telemetry-config in openshift-monitoring.

Actual results:

telemetry-config in openshift-monitoring does not contain RHACS metrics.

Expected results:

telemetry-config in openshift-monitoring does contain RHACS metrics.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2137

Story HOSTEDCP-919: 4.14 MCO kubelet config from payload comes with --external

View the Description View the linked PRs

Context:

In 4.14 kubelet config from MCO payload comes with --external, which means node.cloudprovider.kubernetes.io/uninitialized taint is set preventing workloads from being scheduled and only cleaned up by the external cloud provider.

This has come as a result of AWS removing their in-tree provider implementation for K8s 1.27

DoD:

We need to let the CPO run the AWS external cloud provider.

Bug OCPBUGS-16693: Import page create button is disabled due to PAC validation

View the Description View the linked PRs

Description of problem:

When use selects "Use Pipeline from this cluster" oprtion from Add Pipeline section, then Create button should be enabled but due to PAC validation the Create button is disabled

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Always

Steps to Reproduce:

1. Go to Import from Git page
2. Add repository https://bitbucket.org/lokanandap/hello-func
3. Select Use Pipeline from this cluster in Add Pipeline section

Actual results:

Create button is disabled

Expected results:

Create button should be enabled to create the workload

Additional info:

https://github.com/openshift/console/pull/13046

Bug OCPBUGS-12714: Prometheus, promtail, node exporter consuming all CPU on a system

View the Description View the linked PRs

Description of problem:

Under heavy control plane load (bringing up ~200 pods), prometheus/promtail spikes to over 100% CPU, node_exporter goes to ~200% cpu and stays there for 5-10 minutes. Tested on a GCP cluster bot using 2 physical core (4 vcpu) workers. This starves out essential platform functions like OVS from getting any CPU and causes the data plane to go down.

Running perf against node_exporter reveals the application is consuming the majority of its CPU trying to list new interfaces being added in sysfs. This looks like it is due to disbling netlink via:

https://issues.redhat.com/browse/OCPBUGS-8282

This operation grabs the rtnl lock which can compete with other components on the host that are trying to configure networking.

Version-Release number of selected component (if applicable):

Tested on 4.13 and 4.14 with GCP.

How reproducible:

3/4 times

Steps to Reproduce:

1. Launch gcp with cluster bot
2. Create a deployment with pause containers which will max out pods on the nodes:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webserver-deployment
  namespace: openshift-ovn-kubernetes
  labels:
    pod-name: server
    app: nginx
    role: webserver
spec:
  replicas: 700
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
        role: webserver
    spec:
      containers:
        - name: webserver1
          image: k8s.gcr.io/pause:3.1
          ports:
            - containerPort: 80
              name: serve-80
              protocol: TCP 
3. Watch top cpu output. Wait for node_exporter and prometheus to show very high CPU. If this does not happen, proceed to step 4.
4. Delete the deployment and then recreate it.
5. High and persistent CPU usage should now be observed.

Actual results:

CPU is pegged on the host for several minutes. Terminal is almost unresponsive. Only way to fix it was to delete node_exporter and prometheus DS.

Expected results:

Prometheus and other metrics related applications should:
1. use netlink to avoid grabbing rtnl lock
2. should be cpu limited. Certain required applications in OCP are resource unbounded (like networking data plane) to ensure the node's core functions continue to work. Metrics however should be CPU limited to avoid tooling from locking up a node.

Additional info:

Perf summary (will attach full perf output)
    99.94%     0.00%  node_exporter  node_exporter      [.] runtime.goexit.abi0
            |
            ---runtime.goexit.abi0
               |
                --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func2
                          |
                           --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1
                                     |
                                      --99.33%--github.com/prometheus/node_exporter/collector.execute
                                                |
                                                |--97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).Update
                                                |          |
                                                |           --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).netClassSysfsUpdate
                                                |                     |
                                                |                      --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).getNetClassInfo
                                                |                                |
                                                |                                 --97.64%--github.com/prometheus/procfs/sysfs.FS.NetClassByIface
                                                |                                           |
                                                |                                            --97.64%--github.com/prometheus/procfs/sysfs.parseNetClassIface
                                                |                                                      |
                                                |                                                       --97.61%--github.com/prometheus/procfs/internal/util.SysReadFile
                                                |                                                                 |
                                                |                                                                  --97.45%--syscall.read
                                                |                                                                            |
                                                |                                                                             --97.45%--syscall.Syscall
                                                |                                                                                       |
                                                |                                                                                        --97.45%--runtime/internal/syscall.Syscall6
                                                |                                                                                                  |
                                                |                                                                                                   --70.34%--entry_SYSCALL_64_after_hwframe
                                                |                                                                                                             do_syscall_64
                                                |                                                                                                             |
                                                |                                                                                                             |--39.13%--ksys_read
                                                |                                                                                                             |          |
                                                |                                                                                                             |          |--31.97%--vfs_read

Bug OCPBUGS-15011: Upload JAR file does not work if the Cluster Samples Operator is disabled

View the Description View the linked PRs

Description of problem:
This is a clone of the doc issue ~~OCPBUGS-9162~~.

Import JAR files doesn't work if the Cluster Samples Operator is not installed. This is a common issue in disconnected clusters where the Cluster Samples Operator is disabled by default. Users should not see the JAR import option if its not working correctly.

Version-Release number of selected component (if applicable):
4.9+

How reproducible:
Always, when the samples operator is not installed

Steps to Reproduce:

Setup a cluster without samples operator or uninstall all "Java" Builder Images (ImageStreams from the openshift namespace)
Switch to the Developer perspective
Navigate to Add > Import JAR file
Upload a JAR file and press Create

Actual results:
Import doesn't work

Expected results:
The Import JAR file option should not be disabled if no "Java" Builder Image (ImageStream in the openshift namespace) is available

Additional info:

https://github.com/openshift/console/pull/12917

Bug OCPBUGS-7485: When Creating Sample Devfile from the Samples Page, Topology Icon is not set

View the Description View the linked PRs

Description of problem:

When Creating Sample Devfile from the Samples Page, corresponding Topology Icon for the app is not set. This issue is not observed when we create a BuildImage from the Samples page.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Create a Sample Devfile App from the Samples Page
2. Go to the Topology Page and check the icon of the app created.

Actual results:

The generic Openshift logo is displayed

Expected results:

Need to show the corresponding app icon (Golang, Quarkus, etc.)

Additional info:

In case of creating sample of BuilderImage, the icon gets properly set as per the BuilderImage used.

Current label: app.openshift.io/runtime=dotnet-basic
Change to: app.openshift.io/runtime=dotnet

https://github.com/openshift/console/pull/12725

Bug OCPBUGS-12313: Update 4.14 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/66

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/66

Bug OCPBUGS-23474: OCP installation its failing because VIP is not being allocated to the bootstrap node

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23432~~. The following is the description of the original issue:
—
Description of problem:

OCPv4.14.1 installation its failing because VIP is not being allocated to the bootstrap node

Version-Release number of selected component (if applicable):

OCPv4.14.1

How reproducible:

100% --> https://access.redhat.com/support/cases/#/case/03668010

Steps to Reproduce:

1.
2.
3.

Actual results:

https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB

Expected results:

OCP installation to end sucessfully

Additional info:

In the comment https://access.redhat.com/support/cases/#/case/03668010/discussion?commentId=a0a6R00000Vmdf3QAB are described the current state and issue. If additional logs are required I can arrange for this.

Bug OCPBUGS-8474: Undiagnosed panic in cloud-provider-azure pod

View the Description View the linked PRs

Description of problem:

The Azure CCM will panic when it loses its leader election lease. This is contrary to the behaviour of other components which exit intentionally.

See https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-azure-modern/1632791244243472384

Version-Release number of selected component (if applicable):

How reproducible:

Force the CCM to lose leader election, can happen during upgrades

Steps to Reproduce:

1.
2.
3.

Actual results:

Code will panic, eg 

E0306 18:09:14.315039       1 runtime.go:77] Observed a panic: leaderelection lost
goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1adc660?, 0x219b9c0})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x81e22e?})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1adc660, 0x219b9c0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1.1()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:138 +0x27
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1()
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:203 +0x1f
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0002c0d80, {0x21bce08, 0xc0001ac008})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x14d
k8s.io/client-go/tools/leaderelection.RunOrDie({0x21bce08, 0xc0001ac008}, {{0x21c0e00, 0xc0002c0c60}, 0x1fe5d61a00, 0x18e9b26e00, 0x60db88400, {0xc000418080, 0x1fc4978, 0x0}, ...})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:226 +0x94
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1(0xc000170000?, {0x1ea43e2?, 0xd?, 0xd?})
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:130 +0x3a7
github.com/spf13/cobra.(*Command).execute(0xc000170000, {0xc00019e010, 0xd, 0xd})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000170000)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:918
main.main()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/controller-manager.go:47 +0xc5
panic: leaderelection lost [recovered]
	panic: leaderelection lost

goroutine 1 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x81e22e?})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x1adc660, 0x219b9c0})
	/usr/lib/golang/src/runtime/panic.go:884 +0x212
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1.1()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:138 +0x27
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run.func1()
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:203 +0x1f
k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run(0xc0002c0d80, {0x21bce08, 0xc0001ac008})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:213 +0x14d
k8s.io/client-go/tools/leaderelection.RunOrDie({0x21bce08, 0xc0001ac008}, {{0x21c0e00, 0xc0002c0c60}, 0x1fe5d61a00, 0x18e9b26e00, 0x60db88400, {0xc000418080, 0x1fc4978, 0x0}, ...})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:226 +0x94
sigs.k8s.io/cloud-provider-azure/cmd/cloud-controller-manager/app.NewCloudControllerManagerCommand.func1(0xc000170000?, {0x1ea43e2?, 0xd?, 0xd?})
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/app/controllermanager.go:130 +0x3a7
github.com/spf13/cobra.(*Command).execute(0xc000170000, {0xc00019e010, 0xd, 0xd})
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:876 +0x67b
github.com/spf13/cobra.(*Command).ExecuteC(0xc000170000)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/openshift/cloud-provider-azure/vendor/github.com/spf13/cobra/command.go:918
main.main()
	/go/src/github.com/openshift/cloud-provider-azure/cmd/cloud-controller-manager/controller-manager.go:47 +0xc5

Expected results:

Code should exit without panicking

Additional info:

https://github.com/openshift/cloud-provider-azure/pull/57

Bug OCPBUGS-10816: Volume unmount repeats after successful unmount, preventing pod delete

View the Description View the linked PRs

Description of problem:

We have observed a situation where:
- A workload mounting multiple EBS volumes gets stuck in a Terminating state when it finishes.
- The node that the workload ran on eventually gets stuck draining, because it gets stuck on unmounting one of the volumes from that workload, despite no containers from the workload now running on the node.

What we observe via the node logs is that the volume seems to unmount successfully. Then it attempts to unmount a second time, unsuccessfully. This unmount attempt then repeats and holds up the node.

Specific examples from the node's logs to illustrate this will be included in a private comment.

Version-Release number of selected component (if applicable):

4.11.5

How reproducible:

Has occurred on four separate nodes on one specific cluster, but the mechanism to reproduce it is not known.

Steps to Reproduce:

1.
2.
3.

Actual results:

A volume gets stuck unmounting, holding up removal of the node and completed deletion of the pod.

Expected results:

The volume should not get stuck unmounting.

Additional info:

https://github.com/openshift/aws-ebs-csi-driver/pull/224

Bug OCPBUGS-11285: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13028

Bug OCPBUGS-20163: Wrong port reported in HostedCluster .status.controlPlaneEndpoint.port

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19674~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

When using a route to expose the API server endpoint in a HostedCluster, the .status.controlPlaneEndpoint.port is reported as 6443 (the internal port) instead of 443 which is the port that is externally exposed via the route.

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with a custom dns name using route as the strategy
3. Inspect .status.controlPlaneEndpoint

Actual results:

It has 6443 as the port

Expected results:

It has 443 as the port

Additional info:

https://github.com/openshift/hypershift/pull/3078

Bug OCPBUGS-20734: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/146

Bug OCPBUGS-12110: Update 4.14 ose-cluster-control-plane-machine-set-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/197

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/202

Bug OCPBUGS-16594: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/network-metrics-daemon/pull/79

Bug OCPBUGS-21384: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/256

Bug MGMT-15023: Missing help text for vCenter cluster field

View the Description View the linked PRs

Description of the problem:

vSphere vCenter cluster field is missing description

How reproducible:

always

Steps to reproduce:

1. install OCP on vSphere platform

2. Go to Overview -> vSphere, configure

Actual results:

vCenter cluster field is missing description

Expected results:

Description is present

https://github.com/openshift/console/pull/12912

Bug OCPBUGS-15572: The install operator Update Approval radio button to switch to Manual approval does not work

View the Description View the linked PRs

Description of problem:

Selecting "Manual" for Update approval does not take effect.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12959

Bug OCPBUGS-16135: awsendpointservice stuck deleting due to missing hosted zone

View the Description View the linked PRs

Description of problem:

The control-plane-operator pod gets stuck deleting an awsendpointservice if its hostedzone is already gone:

Logs:

{"level":"error","ts":"2023-07-13T03:06:58Z","msg":"Reconciler error","controller":"awsendpointservice","controllerGroup":"hypershift.openshift.io","controllerKind":"AWSEndpointService","aWSEndpointService":{"name":"private-router","namespace":"ocm-staging-24u87gg3qromrf8mg2r2531m41m0c1ji-diegohcp-west2"},"namespace":"ocm-staging-24u87gg3qromrf8mg2r2531m41m0c1ji-diegohcp-west2","name":"private-router","reconcileID":"59eea7b7-1649-4101-8686-78113f27567d","error":"failed to delete resource: NoSuchHostedZone: No hosted zone found with ID: Z05483711XJV23K8E97HK\n\tstatus code: 404, request id: f8686dd6-a906-4a5e-ba4a-3dd52ad50ec3","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.12.24

How reproducible:

Have not tried to reproduce yet, but should be fairly reproducible

Steps to Reproduce:

1. Install a PublicAndPrivate or Private HCP
2. Delete the Route53 Hosted Zone defined in its awsendpointservice's .status.dnsZoneID field
3. Start an uninstall
4. Observe the control-plane-operator looping on the above logs and the uninstall hanging

Actual results:

Uninstall hangs due to CPO being unable to delete the awsendpointservice

Expected results:

awsendpointservice cleans up, if the hosted zone is already gone CPO shouldn't care that it can't list hosted zones

Additional info:

https://github.com/openshift/hypershift/pull/2811

Bug OCPBUGS-8328: aws-ebs-csi-driver-operator ServiceAccount does not include the HCP pull-secret in its imagePullSecrets

View the Description View the linked PRs

aws-ebs-csi-driver-operator ServiceAccount does not include the HCP pull-secret in its imagePullSecrets. Thus, if a HostedCluster is created with a `pullSecret` that contains creds that the management cluster pull secret does not have, the image pull fails.

https://github.com/openshift/cluster-storage-operator/pull/346

Bug OCPBUGS-17882: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-image-registry-operator/pull/891

Bug OCPBUGS-27858: whereabouts reconciler schedule is not configurable

View the Description View the linked PRs

Description of problem:

whereabouts reconciler is responsible for reclaiming dangling IPs, and freeing them to be available to allocate to new pods.
This is crucial for scenarios where the amount of addresses are limited and dangling IPs prevent whereabouts from successfully allocating new IPs to new pods.

The reconciliation schedule is currently hard-coded to run once a day, without a user-friendly way to configure.

Version-Release number of selected component (if applicable):

How reproducible:

    Create a Whereabouts reconciler daemon set, not able to configure the reconciler schedule.

Steps to Reproduce:

    1. Create a Whereabouts reconciler daemonset
       instructions: https://docs.openshift.com/container-platform/4.14/networking/multiple_networks/configuring-additional-      network.html#nw-multus-creating-whereabouts-reconciler-daemon-set_configuring-additional-network

     2. Run `oc get pods -n openshift-multus | grep whereabouts-reconciler`

     3. Run `oc logs whereabouts-reconciler-xxxxx`

Actual results:

    You can't configure the cron-schedule of the reconciler.

Expected results:

    Be able to modify the reconciler cron schedule.

Additional info:

    The fix for this bug is in two places: whereabouts, and cluster-network-operator.
    From this reason, in order to verify correctly we need to use both fixed components.
    Please read below for more details about how to apply the new configurations.

How to Verify:

    Create a whereabouts-config ConfigMap with a custom value, and check in the
    whereabouts-reconciler pods' logs that it is updated, and triggering the clean up.

Steps to Verify:

    1. Create a Whereabouts reconciler daemonset
    2. Wait for the whereabouts-reconciler pods to be running. (takes time for the daemonset to get created).
    3. See in logs: "[error] could not read file: <nil>, using expression from flatfile: 30 4 * * *"
       This means it uses the hardcoded default value. (Because no ConfigMap yet)
    4. Run: oc create configmap whereabouts-config -n openshift-multus --from-literal=reconciler_cron_expression="*/2 * * * *"
    5. Check in the logs for: "successfully updated CRON configuration" 
    6. Check that in the next 2 minutes the reconciler runs: "[verbose] starting reconciler run"

Story MGMT-14195: Update gomock.Any() calls in infrastructure_test

View the Description View the linked PRs

The code in our infrastructure test needs to be updated to make the test more accurate. Currently we are targeting gomock.any() in many cases, this means that the tests are not as accurate as they could be.

Updates should be similar to MGMT-13918

https://github.com/openshift/assisted-service/pull/5104

Bug OCPBUGS-11997: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3794

Bug OCPBUGS-16507: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-config-operator/pull/333

Bug OCPBUGS-4122: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3653

Bug OCPBUGS-10009: CNO doesn't handle nodeSelector in HyperShift

View the Description View the linked PRs

CNO should respect the `nodeSelector` setting in hostecontrolplane:

https://github.com/openshift/hypershift/blob/5f903f2a48ef2abc3045584f646e92ac0f735fad/docs/content/how-to/distribute-hosted-cluster-workloads.md#topology

Affinity and tolerations support is handled here: https://issues.redhat.com/browse/OCPBUGS-8692

https://github.com/openshift/cluster-network-operator/pull/1736

Bug OCPBUGS-11298: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/751

Bug OCPBUGS-13208: The size of PVC mounted by ibm-spectrum-scale-pmcollector-0 pod shows negative value from Openshift WebConsole

View the Description View the linked PRs

Description of problem:
The size of PVC/datadir-ibm-spectrum-scale-pmcollector-0 is displayed incorrectly in Openshift webconsole. The PVC size is shown as (negative) -17.6GiB.
Below is SC, PV and PVC details.

$ oc get storageclass
NAME                            PROVISIONER                    RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
ibm-spectrum-fusion-mgmt-sc     spectrumscale.csi.ibm.com      Delete          Immediate              true                   2d
ibm-spectrum-fusion (default)   spectrumscale.csi.ibm.com      Delete          Immediate              true                   2d
ibm-spectrum-scale-internal     kubernetes.io/no-provisioner   Delete          WaitForFirstConsumer   false                  2d
ibm-spectrum-scale-sample       spectrumscale.csi.ibm.com      Delete          Immediate              false                  2d


$ oc get pv
control-1.ncw-az1-005.caas.bbtnet.com-pmcollector   25Gi          RWO           Retain           Bound    ibm-spectrum-scale/datadir-ibm-spectrum-scale-pmcollector-0                     ibm-spectrum-scale-internal  

$ oc get pvc  -A
NAMESPACE            NAME                                       STATUS   VOLUME                                              CAPACITY   ACCESS MODES   STORAGECLASS                  AGE
ibm-spectrum-scale   datadir-ibm-spectrum-scale-pmcollector-0   Bound    control-1.ncw-az1-005.caas.bbtnet.com-pmcollector   25Gi       RWO            ibm-spectrum-scale-internal   3d


$ oc get pvc datadir-ibm-spectrum-scale-pmcollector-0 -n ibm-spectrum-scale
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
  resourceVersion: '5360546'
  name: datadir-ibm-spectrum-scale-pmcollector-0
  uid: 7a7d0609-0608-409f-91e1-209bb0b3c8d1
  creationTimestamp: '2023-05-01T14:13:40Z'
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-05-01T14:13:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
          'f:labels':
            .: {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/name': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
          'f:volumeName': {}
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2023-05-01T14:13:40Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:accessModes': {}
          'f:capacity':
            .: {}
            'f:storage': {}
          'f:phase': {}
      subresource: status
  namespace: ibm-spectrum-scale
  finalizers:
    - kubernetes.io/pvc-protection
  labels:
    app.kubernetes.io/instance: ibm-spectrum-scale
    app.kubernetes.io/name: pmcollector
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 25Gi
  volumeName: control-1.ncw-az1-005.caas.bbtnet.com-pmcollector
  storageClassName: ibm-spectrum-scale-internal
  volumeMode: Filesystem
status:
  phase: Bound
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 25Gi

==> However, when executing from pod ibm-spectrum-scale-pmcollector-0, the mountPath `/opt/IBM/zimon/data` where PVC/datadir-ibm-spectrum-scale-pmcollector-0 is mounted still shows that only 12K is used so far and 11G is the currently available space.

[C49904@openshift-eng-bastion-vm ~]$ oc rsh ibm-spectrum-scale-pmcollector-0
Defaulted container "pmcollector" out of: pmcollector, sysmon

sh-4.4$ df -Th | grep -iE 'size|zimon'
Filesystem     Type     Size  Used Avail Use% Mounted on
tmpfs          tmpfs     11G   12K   11G   1% /opt/IBM/zimon/config

Version-Release number of selected component (if applicable):

OCP 4.10.21
isf-operator.v2.4.0

How reproducible:

Steps to Reproduce:

1. by installing IBM Spectrum Scale 
2. 
3.

Actual results:

PVC size displayed from Openshift webconsole shows negative size value.

Expected results:

 
PVC size displayed from Openshift webconsole should not show negative size value.

Additional info:

https://github.com/openshift/console/pull/12867

Bug OCPBUGS-17251: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer-agent/pull/586

Bug OCPBUGS-6553: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/egress-router-cni/pull/67

Bug OCPBUGS-17049: CR.status.lastSyncGeneration is not updated

View the Description View the linked PRs

Description of problem:

CR.status.lastSyncGeneration is not updated in STS mode (AWS).

Steps to Reproduce:

See https://issues.redhat.com/browse/OCPBUGS-16684.

https://github.com/openshift/cloud-credential-operator/pull/585

Bug OCPBUGS-21555: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/104

Bug OCPBUGS-24326: [4.14] adminpolicybasedexternalroutes CR accepts an invalid IP address

View the Description View the linked PRs

Description of problem:

When configuring an adminpolicybasedexternalroutes policy, if we use capital letters in the policy name, a validation test fails, blocks policy creation great:

The AdminPolicyBasedExternalRoute "invalidIP" is invalid: metadata.name: Invalid value: "invalidIP": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

If we forget to populate next-hope section again a validation test fails, policy isn't created again great:

$ oc apply -f 4.create.abp_static_NoHope.yaml 
The AdminPolicyBasedExternalRoute "invalid-no-nexthope-policy" is invalid: spec.nextHops: Required value


But if we set an invalid IP address on next-hope, no validations checks the proposed IPv4/v6 address(es), confirming valid IP addresses is IMHO some worth adding, for the rare typos that might slip unnoticed.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

Every time

Steps to Reproduce:

1. Deploy a cluster 

2. Try to create a static policy with an invalid IP address, it should fail, yet it doesn't mention any error, proceding with policy creation with an invalid IP address, I tested it on IPv4 but the same thing could also happened on IPv6.

$ cat 4.create.abp_static_invalidIP.yaml
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: invalidip
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          kubernetes.io/metadata.name: bar
  nextHops:
    static:
      - ip: "1734.20.0.8"  <----- Invalid IP :)


$ oc apply -f 4.create.abp_static_invalidIP.yaml 
adminpolicybasedexternalroute.k8s.ovn.org/invalidip created  

And nooooo error message/no validations, this should fail here with a user error. 



[kni@provisionhost-0-0 ~]$ oc get adminpolicybasedexternalroutes.k8s.ovn.org 
NAME        LAST UPDATE   STATUS
invalidip                 
[kni@provisionhost-0-0 ~]$ oc describe adminpolicybasedexternalroutes.k8s.ovn.org invalidip 
Name:         invalidip
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  k8s.ovn.org/v1
Kind:         AdminPolicyBasedExternalRoute
Metadata:
  Creation Timestamp:  2023-10-31T08:50:58Z
  Generation:          1
  Resource Version:    11128481
  UID:                 99af3e73-00dd-408b-8238-397cc9a795bc
Spec:
  From:
    Namespace Selector:
      Match Labels:
        kubernetes.io/metadata.name:  bar
  Next Hops:
    Static:
      Bfd Enabled:  false
      Ip:           1734.20.0.8
Events:             <none>

We see above the invalid IP was consumed as-is, which is wrong the policy shouldn't have been applied to begin with.

Actual results:

A policy is created despite using an invalid IP address, see above.

Expected results:

Policy creation should fail, with a notification of invalid IP address, same as we get when we try an invalid policy name.

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2196

Story MCO-687: TestMetrics e2e not cleaning up correctly after itself

View the Description View the linked PRs

The TestMetrics e2e test is not correctly cleaning up the MachineConfigs and MachineConfigPools it creates. This means that other e2e tests which run after this e2e test can falsely fail or become flaky.

What's happening is this:

The target node is removed from the ephemeral MachineConfigPool by unlabelling it.
A race condition occurs when we call WaitForPoolComplete because technically, the pool is updated at this point since it has not yet picked up the unlabelling event from the target node.
We delete the ephemeral MachineConfigPool, which deletes the rendered MachineConfigs that belong to it.
The node starts the update process, but cannot find the rendered MachineConfigs for the ephemeral pool since they were deleted. The MCD degrades at this point and blocks the worker MachineConfigPool.

The cleanup flow should look like this:

The target node is removed from the ephemeral MachineConfigPool by unlabeling it.
Wait until the target node completes the switch back to the worker pool.
Delete the ephemeral MachineConfigPool that was created for the test.
Delete any MachineConfigs assigned to that ephemeral MachineConfigPool.

https://github.com/openshift/machine-config-operator/pull/3813

Bug OCPBUGS-23202: A Master Machine is stuck in deleting state after replacing the network by a wrong one in CPMS and updating it back

View the Description View the linked PRs

Description of problem:

After updating a CPMS CR with a non-existent network a machine is stuck in provisioning state.
The when updating the CPMS with the previous one the Master Machine is stuck in deleting state

Logs from the machine api controller:
I0720 13:03:58.894171       1 controller.go:187] ostest-2pwfk-master-xwprn-0: reconciling Machine
I0720 13:03:58.902876       1 controller.go:231] ostest-2pwfk-master-xwprn-0: reconciling machine triggers delete
E0720 13:04:00.200290       1 controller.go:255] ostest-2pwfk-master-xwprn-0: failed to delete machine: filter matched no resources
E0720 13:04:00.200499       1 controller.go:329]  "msg"="Reconciler error" "error"="filter matched no resources" "controller"="machine-controller" "name"="ostest-2pwfk-master-xwprn-0" "namespace"="openshift-machine-api" "object"={"name":"ostest-2pwfk-master-xwprn-0","namespace":"openshift-machine-api"} "reconcileID"="9ccb5885-4b9f-4190-95a2-1120f2566c52"

Version-Release number of selected component (if applicable):

OCP 4.14.0-0.nightly-2023-07-18-085740
RHOS-17.1-RHEL-9-20230712.n.1

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-openstack/pull/95

Bug OCPBUGS-23327: file path used for oci images can result in an error

View the Description View the linked PRs

Description of problem:

When executing oc mirror using an oci path, you can end up with in an error state when the destination is a file://&lt;path> destination (i.e. mirror to disk).

Version-Release number of selected component (if applicable):

4.14.2

How reproducible:

always

Steps to Reproduce:

At IBM we use the ibm-pak tool to generate a OCI catalog, but this bug is reproducible using a simple skopeo copy. Once you've copied the image locally you can move it around using file system copy commands to test this in different ways.

1. Make a directory structure like this to simulate how ibm-pak creates its own catalogs. The problem seems to be related to the path you use, so this represents the failure case:

mkdir -p /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list

2. make a location where the local storage will live:

mkdir -p /root/.ibm-pak/oc-mirror-storage

3. Next, copy the image locally using skopeo:

skopeo copy docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:8d28189637b53feb648baa6d7e3dd71935656a41fd8673292163dd750ef91eec oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list --all --format v2s2

4. You can copy the OCI catalog content to a location where things will work properly so you can see a working example:

cp -r /root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list /root/ibm-zcon-zosconnect-catalog

5. You'll need an ISC... I've included both the oci references in the example (the commented out one works, but the oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list reference fails).

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
mirror:
operators:
- catalog: oci:///root/.ibm-pak/data/publish/latest/catalog-oci/manifest-list
#- catalog: oci:///root/ibm-zcon-zosconnect-catalog
packages:
- name: ibm-zcon-zosconnect
channels:
- name: v1.0
full: true
targetTag: 27ba8e
targetCatalog: ibm-catalog
storageConfig:
local:
path: /root/.ibm-pak/oc-mirror-storage

6. run oc mirror (remember the ISC has oci refs for good and bad scenarios). You may want to change your working directory to different locations between running the good/bad examples.

oc mirror --config /root/.ibm-pak/data/publish/latest/image-set-config.yaml "file://zcon --dest-skip-tls --max-per-registry=6

Actual results:


Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
error: ".ibm-pak/data/publish/latest/catalog-oci/manifest-list/kubebuilder/kube-rbac-proxy@sha256:db06cc4c084dd0253134f156dddaaf53ef1c3fb3cc809e5d81711baa4029ea4c" is not a valid image reference: invalid reference format

Expected results:


Simple example where things were working with the oci:///root/ibm-zcon-zosconnect-catalog reference (this was executed in the same workspace so no new images were detected).

Logging to .oc-mirror.log
Found: zcon/oc-mirror-workspace/src/publish
Found: zcon/oc-mirror-workspace/src/v2
Found: zcon/oc-mirror-workspace/src/charts
Found: zcon/oc-mirror-workspace/src/release-signatures
3 related images processed in 668.063974ms
Writing image mapping to zcon/oc-mirror-workspace/operators.1700092336/manifests-ibm-zcon-zosconnect-catalog/mapping.txt
No new images detected, process stopping

Additional info:


I debugged the error that happened and captured one of the instances where the ParseReference call fails. This is only for reference to help narrow down the issue.

github.com/openshift/oc/pkg/cli/image/imagesource.ParseReference (/root/go/src/openshift/oc-mirror/vendor/github.com/openshift/oc/pkg/cli/image/imagesource/reference.go:111)
github.com/openshift/oc-mirror/pkg/image.ParseReference (/root/go/src/openshift/oc-mirror/pkg/image/image.go:79)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:194)
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3 (/root/go/src/openshift/oc-mirror/pkg/cli/mirror/operator.go:575)
golang.org/x/sync/errgroup.(*Group).Go.func1 (/root/go/src/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75)
runtime.goexit (/usr/local/go/src/runtime/asm_amd64.s:1594)

Also, I wanted to point out that because we use a period in the path (i.e. .ibm-pak) I wonder if that's causing the issue? This is just a guess and something to consider. *FOLLOWUP* ... I just removed the period from ".ibm-pak" and that seemed to make the error go away.

https://github.com/openshift/oc-mirror/pull/766

Task MGMT-12301: Don't perform state transitions outside of the state machine

View the Description View the linked PRs

We use the state machine design pattern to have explicit clear rules for how hosts can move in and out of states depending on the things that are happening.

This makes it relatively easy to follow / understand host behavior.

We should ensure our code doesn't contain places where we force a host into a state, without going through the state machine 🍝, otherwise it beats the purpose of having a state machine

One example that personally confused me is this switch statement, which contains updates like this one , this one and this one and also this one

https://github.com/openshift/assisted-service/pull/5103

Bug OCPBUGS-19725: Do not configure the node webhook if not using ovn-kubernetes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19715~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2032

Bug OCPBUGS-19731: [Nutanix]No host has enough available memory for VM, machine stuck in Provisioning and machineset scale/delete cannot delete machines

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-5969~~. The following is the description of the original issue:
—
Description of problem:

Nutanix machine without enough memory stuck in Provisioning and machineset scale/delete cannot work

Version-Release number of selected component (if applicable):

Server Version: 
4.12.0
4.13.0-0.nightly-2023-01-17-152326

How reproducible:

Always

Steps to Reproduce:

1. Install Nutanix Cluster 
Template https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/tree/master/functionality-testing/aos-4_12/ipi-on-nutanix//versioned-installer
master_num_memory: 32768
worker_num_memory: 16384
networkType: "OVNKubernetes"
installer_payload_image: quay.io/openshift-release-dev/ocp-release:4.12.0-x86_64 2.
3. Scale up the cluster worker machineset from 2 replicas to 40 replicas
4. Install a Infra machinesets with 3 replicas, and a Workload machinesets with 1 replica
Refer to this doc https://docs.openshift.com/container-platform/4.11/machine_management/creating-infrastructure-machinesets.html#machineset-yaml-nutanix_creating-infrastructure-machinesets  and config the following resource
VCPU=16
MEMORYMB=65536
MEMORYSIZE=64Gi

Actual results:

1. The new infra machines stuck in 'Provisioning' status for about 3 hours.

% oc get machines -A | grep Prov                                               
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      175m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      175m

2. Checking the Nutanix web console, I found 
infra machine 'qili-nut-big-jh468-infra-jnznv' had the following msg
"
No host has enough available memory for VM qili-nut-big-jh468-infra-48mdt (8d7eb6d6-a71e-4943-943a-397596f30db2) that uses 4 vCPUs and 65536MB of memory. You could try downsizing the VM, increasing host memory, power off some VMs, or moving the VM to a different host. Maximum allowable VM size is approximately 17921 MB
"

infra machine 'qili-nut-big-jh468-infra-jnznv' is not round

infra machine 'qili-nut-big-jh468-infra-xp7xb' is in green without warning.
But In must gather I found some error:
03:23:49openshift-machine-apinutanixcontrollerqili-nut-big-jh468-infra-xp7xbFailedCreateqili-nut-big-jh468-infra-xp7xb: reconciler failed to Create machine: failed to update machine with vm state: qili-nut-big-jh468-infra-xp7xb: failed to get node qili-nut-big-jh468-infra-xp7xb: Node "qili-nut-big-jh468-infra-xp7xb" not found

3. Scale down the worker machineset from 40 replicas to 30 replicas can not work. Still have 40 Running worker machines and 40 Ready nodes after about 3 hours.

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-infra      3         3                             176m
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h1m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             176m

% oc get machines -A | grep worker| grep Running -c
40

% oc get nodes | grep worker | grep Ready -c
40

4. I delete the infra machineset, but the machines still in Provisioning status and won't get deleted

% oc delete machineset -n openshift-machine-api   qili-nut-big-jh468-infra
machineset.machine.openshift.io "qili-nut-big-jh468-infra" deleted

% oc get machinesets -A
NAMESPACE               NAME                          DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   qili-nut-big-jh468-worker     30        30        30      30          5h26m
openshift-machine-api   qili-nut-big-jh468-workload   1         1                             3h21m

% oc get machines -A | grep -v Running
NAMESPACE               NAME                                PHASE          TYPE   REGION    ZONE              AGE
openshift-machine-api   qili-nut-big-jh468-infra-48mdt      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-jnznv      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-infra-xp7xb      Provisioning                                      3h22m
openshift-machine-api   qili-nut-big-jh468-workload-qdkvd                                                     3h22m

Expected results:

The new infra machines should be either Running or Failed.
Cluster worker machinest scaleup and down should not be impacted.

Additional info:

must-gather download url will be added to the comment.

https://github.com/openshift/machine-api-provider-nutanix/pull/53

Bug OCPBUGS-19897: CA bundles for hosted cluster monitoring not created

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-14819~~. The following is the description of the original issue:
—
Description of problem:

alertmanager-trusted-ca-bundle, prometheus-trusted-ca-bundle, telemeter-trusted-ca-bundle, thanos-querier-trusted-ca-bundle are empty on the hosted cluster. This results in CMO not creating the prometheus CR, resulting in no prometheus pods. 

This issue prevents us from monitoring the hosted cluster.

Version-Release number of selected component (if applicable):

4.13.z

How reproducible:

Rare: Found only one occurence for now.

Steps to Reproduce:

1.
2.
3.

Actual results:

Certs are not created, prometheus doesn't create prometheus pods

Expected results:

Certs are created and CMO can create prometheus pods

Additional info:

Linked Must Gather of the MC, inspect of the openshift-monitoring DP namespace

Bug MGMT-15389: Generating ignition request is very expensive

View the Description View the linked PRs

Description of the problem:

#!/bin/bashwhile sleep 0.5; do
    for i in {1..10}; do
        curl -I -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" 'https://api.stage.openshift.com/api/assisted-install/v2/infra-envs/3dc00d41-46bf-4b83-9874-f21812263c97/downloads/files?discovery_iso_type=full-iso&file_name=discovery.ign' > /dev/null &
    done ;
done

This script above would cause assisted-service to spike CPU and 99th percentile of requests to jump to 10s

How reproducible:

100%

Steps to reproduce:

1. run script above

2. check response time/cpu usage

3.

Actual results:

response time really slow / 504

Expected results:

service continues to run smoothly

https://github.com/openshift/assisted-service/pull/5400

Bug OCPBUGS-16435: Bump samples operator k8s dep to v0.27.2

View the Description View the linked PRs

Description of problem:

Updating the k* version to v0.27.2 in cluster samples operator for OCP 4.14 release

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/514

Bug OCPBUGS-17943: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/767

Bug OCPBUGS-8215: Ignore device list missing in Node Exporter

View the Description View the linked PRs

Description of problem:

When setting no configuration for node-exporter in CMO config, we did not see the 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude in node-exporter daemonset, full info see: http://pastebin.test.redhat.com/1093428

and checked in 4.13.0-0.nightly-2023-02-27-101545, no configuration for node-exporter, there is collector.netclass.ignored-devices setting
see from: http://pastebin.test.redhat.com/1093429

after disabled netdev/netclass on bot cluster, would see collector.netclass.ignored-devices and collector.netdev.device-exclude settings in node-exporter, since OCPBUGS-7282 is filed on 4.12, disable netdev/netclass is not supported then, I don't think we should disable netdev/netclass

$ oc -n openshift-monitoring get ds node-exporter -oyaml | grep collector
        - --no-collector.wifi
        - --collector.filesystem.mount-points-exclude=^/(dev|proc|sys|run/k3s/containerd/.+|var/lib/docker/.+|var/lib/kubelet/pods/.+)($|/)
        - --collector.netclass.ignored-devices=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
        - --collector.netdev.device-exclude=^(veth.*|[a-f0-9]{15}|enP.*|ovn-k8s-mp[0-9]*|br-ex|br-int|br-ext|br[0-9]*|tun[0-9]*|cali[a-f0-9]*)$
        - --collector.cpu.info
        - --collector.textfile.directory=/var/node_exporter/textfile
        - --no-collector.cpufreq
        - --no-collector.tcpstat
        - --no-collector.netdev
        - --no-collector.netclass
        - --no-collector.buddyinfo
        - '[[ ! -d /node_exporter/collectors/init ]] || find /node_exporter/collectors/init

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

The 2 arguments are missing when booting up OCP with default configurations for CMO.

Actual results:

The 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude are missing in node-exporter DaemonSet.

Expected results:

The 2 arguments collector.netclass.ignored-devices and collector.netdev.device-exclude are present in node-exporter DaemonSet.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1909

Story HOSTEDCP-943: Metrics to visualise components duration

View the Description View the linked PRs

Context:

As a SRE / cluster service / dev I'd like to have the ability to identify trends on the duration of granular components that belong to HC/NodePools and that might affect our SLOs, e.g etcd, infra, ignition, nodes.

DoD:

Add metrics to visualise components duration of transitions.

Start with a few and agree on the approach.

Follow up.

https://github.com/openshift/hypershift/pull/2348

Bug OCPBUGS-10188: Update 4.14 ose-azure-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-azure/pull/59

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-azure/pull/59

Bug OCPBUGS-12669: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/111

Bug OCPBUGS-17374: Dockerfile.fast not working due to .dockerignore

View the Description View the linked PRs

Description of problem:

Dockerfile.fast relies on picking up the `bin` directory built in the host for inclusion in the HyperShift Operator image for development.

Containerfile.operator, for RHTAP, relies on .dockerignore to prevent a `/bin` to be present in the podman build context that has permissions that the user `default` (used by the golang build container) can't write to.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.make docker-build-fast

Actual results:

COPY bin/* /usr/bin/ fails due to bin not being included in the podman build context

Expected results:

The container builds successfully

Additional info:

https://github.com/openshift/hypershift/pull/2879

Bug OCPBUGS-20552: [alibabacloud] IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready

View the Description View the linked PRs

Description of problem:

IPI installation on Alibabacloud cannot succeed, and zero control-plane node ready.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. IPI installation on Alibabacloud, with "credentialsMode: Manual"

Actual results:

Bootstrap failed, with all control-plane nodes NotReady.

Expected results:

The installation should succeed.

Additional info:

The log bundle is available at https://drive.google.com/file/d/1eb1D6GeNyu1Bys6vDyf3ev9aFjzWW6lW/view?usp=drive_link.

The installation of exactly the same scenario can succeed with 4.14.0-ec.4-x86_64.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/289

Bug MGMT-14073: debug info is not printed for data collection

View the Description View the linked PRs

Description of the problem:

Debug info is not printed for data collection

How reproducible:

Always

Steps to reproduce:

1. Deploy MCE multicluster-engine.v2.3.0-81.

2. Enable log level debug for AI

3. Deploy spoke multinode 4.12

Actual results:

No debug info printed.

Expected results:

should print debug info :
log.Debugf("Red Hat Insights Request ID: %+v", res.Header.Get("X-Rh-Insights-Request-Id"))

https://github.com/openshift/assisted-service/pull/5070

Bug OCPBUGS-13147: CMO missing staticcheck and linting

View the Description View the linked PRs

Description of problem:

Cluster Monitoring Operator (CMO) lacks golangci-lint checking and has several violations for linters. The ones we'd be specifically interested into are the staticcheck ones as they are tied to deprecated libraries in go.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1949

Bug OCPBUGS-16599: Pipelines Creation YAML form is not allowing v1beta1 YAMLs get created

View the Description View the linked PRs

Description of problem:

Pipelines Creation YAML form is not allowing v1beta1 YAMLs get created

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Open the Pipelines Creation YAML form
2. Paste the following YAML
3. Submit the form

Actual results:

The form doesnot submit, stating version mismatch. Expects v1, got v1beta1

Expected results:

We must support the creation of both the versions in the YAML form

Additional info:

The issue is not observed when the "Import from YAML" Form is used.

Attachment: https://drive.google.com/file/d/1B_sAuGREgmX800JXGmrL30iByowfHzs7/view?usp=sharing

https://github.com/openshift/console/pull/13034

Bug OCPBUGS-11649: status of awsendpointservice conditions doesn't reflect status of endpoint and endpointservice on AWS

View the Description View the linked PRs

Description of problem:

In awsendpointservice CR AWSEndpointAvailable is still true when endpoint is deleted on AWS console, and AWSEndpointServiceAvailable is still true when endpoint service is deleted on AWS console.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a PublicAndPrivate or Private cluster, wait for cluster to come up
2. Check conditions in awsendpointservice cr, status of AWSEndpointAvailable and AWSEndpointServiceAvailable should be True
3. On AWS console delete endpoint
4. In awsendpointservice cr, check if condition AWSEndpointAvailable is changed to false 
5. On AWS console delete endpoint service
6. In awsendpointservice cr, check if condition AWSEndpointServiceAvailable is changed to false

Actual results:

status of AWSEndpointAvailable and AWSEndpointServiceAvailable is True

Expected results:

status of AWSEndpointAvailable and AWSEndpointServiceAvailable should be False

Additional info:

https://github.com/openshift/hypershift/pull/2424

Task MGMT-8097: Expose schedulable masters via k8s API

View the Description View the linked PRs

The ability to schedule workloads on master nodes is currently exposed via the REST API as a boolean Cluster property "schedulable_masters". For the k8s, we should align with other OpenShift APIs and have a boolean property in the ACM Spec called mastersSchedulable.

https://github.com/openshift/assisted-service/pull/5240

Bug OCPBUGS-17257: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1935

Bug OCPBUGS-17714: oc-mirror will panic when use oci-registries-config

View the Description View the linked PRs

Description of problem:

when use oci-registries-config, the oc-mirror will panic

Version-Release number of selected component (if applicable):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.14.0-202308091944.p0.gdba4a0c.assembly.stream-dba4a0c", GitCommit:"dba4a0cfd0a9fd29c1e4b5bc1da737e1153cc679", GitTreeState:"clean", BuildDate:"2023-08-10T00:13:31Z", GoVersion:"go1.20.5 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1.  mirror to localhost :
cat config.yaml 
apiVersion: mirror.openshift.io/v1alpha2
kind: ImageSetConfiguration
mirror:
  operators:
    - catalog: oci:///home1/oci-414
      packages:
      - name: cluster-logging
oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http
2. use oci-registries-config 
`oc-mirror --config config.yaml docker://localhost:5000 --dest-use-http   --oci-registries-config /home1/registry.conf`

Actual results:

2. The oc-mirror will panic :
oc-mirror --config config.yaml docker://ec2-18-117-165-30.us-east-2.compute.amazonaws.com:5000  --dest-use-http   --oci-registries-config /home1/registry.conf 
Logging to .oc-mirror.log
Checking push permissions for ec2-18-117-165-30.us-east-2.compute.amazonaws.com:5000
Found: oc-mirror-workspace/src/publish
Found: oc-mirror-workspace/src/v2
Found: oc-mirror-workspace/src/charts
Found: oc-mirror-workspace/src/release-signatures
backend is not configured in config.yaml, using stateless mode
backend is not configured in config.yaml, using stateless mode
No metadata detected, creating new workspace
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x2e8a774]

goroutine 43 [running]:
github.com/containers/image/v5/docker.(*dockerImageSource).Close(0x3?)
	/go/src/github.com/openshift/oc-mirror/vendor/github.com/containers/image/v5/docker/docker_image_src.go:170 +0x14
github.com/openshift/oc-mirror/pkg/cli/mirror.findFirstAvailableMirror.func1()
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:449 +0x42
github.com/openshift/oc-mirror/pkg/cli/mirror.findFirstAvailableMirror({0x4c67b38, 0xc0004ca230}, {0xc00ad56000, 0x1, 0x40d19c0?}, {0xc00077e000, 0x94}, {0xc00ac0f6b0, 0x24}, {0x0, ...})
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:467 +0x6df
github.com/openshift/oc-mirror/pkg/cli/mirror.(*MirrorOptions).addRelatedImageToMapping(0xc0001c0f00, {0x4c67b38, 0xc0004ca230}, 0xc00ac13480?, {{0xc0074a14e8?, 0x18?}, {0xc0076563f0?, 0x8b?}}, {0xc000c5b580, 0x36})
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/fbc_operators.go:154 +0x3c5
github.com/openshift/oc-mirror/pkg/cli/mirror.(*OperatorOptions).plan.func3()
	/go/src/github.com/openshift/oc-mirror/pkg/cli/mirror/operator.go:570 +0x52
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/go/src/github.com/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/go/src/github.com/openshift/oc-mirror/vendor/golang.org/x/sync/errgroup/errgroup.go:72 +0xa5

Expected results:

Should  not panic

Additional info:

https://github.com/openshift/oc-mirror/pull/680

Bug OCPBUGS-20705: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/637

Bug OCPBUGS-27048: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-version-operator/pull/1018

Bug OCPBUGS-4240: assisted-installer-controller job does not complete properly

View the Description View the linked PRs

Description of problem:

After the installation of a cluster, based on the agent installer ISO, is completed, the job assisted-installer-controller remains up

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Generate a valid ISO image using the agent installer. All kind of topologies (compact/ha/sno) and configurations are affect by this problem

Steps to Reproduce:

1.
2.
3.

Actual results:

$ oc get jobs -n assisted-installer
NAME                            COMPLETIONS   DURATION   AGE
assisted-installer-controller   0/1           102m       102m

Expected results:

oc get jobs -n assisted-installer should not return any job

Additional info:

It looks like that the assisted-installer-controller has been designed assuming that Assisted Service (AS) was always available and reachable. This is not necessarily true when using the agent installer, since the AS initially running on the rendezvous node will not be available after the node was rebooted.

The assisted-installer-controller performs a number of different tasks internally, and from the logs not all of them complete successfully (a condition to terminate the job).
It could be useful to perform a deeper troubleshooting on the ApproveCsrs one, as it one that does not terminate properly

https://github.com/openshift/assisted-installer/pull/700

Bug OCPBUGS-10154: Update 4.14 ose-machine-api-provider-gcp image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-gcp/pull/44

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-gcp/pull/44

Bug OCPBUGS-12951: 4.13.0-RC.6 Enter to Cluster status: error while trying to install cluster with agent base installer

View the Description View the linked PRs

Description of problem:

4.13.0-RC.6 Enter to Cluster status: error while trying to install cluster with agent base installer
After the read disk stage the cluster status turn to "error"

Version-Release number of selected component (if applicable):

How reproducible:

Create image with the attached install config and agent config file and boot node with this images

Steps to Reproduce:

1. Create image with the attached install config and agent config file and boot node with this images

Actual results:

Cluster status: error

Expected results:

Should continue with cluster status: installing

Additional info:

https://github.com/openshift/machine-config-operator/pull/3691

Bug OCPBUGS-14029: The vsphere-problem-detector-operator panics if vsphere Infrastructure field is empty

View the Description View the linked PRs

Duplicate to use automation since original bug is restricted.
https://issues.redhat.com/browse/OCPBUGS-14022

https://github.com/openshift/vsphere-problem-detector/pull/115

Bug OCPBUGS-15135: Topology sidebar doesn't show copy button for Knative routes

View the Description View the linked PRs

Description of problem:
Starting with OpenShift 4.13 we show a copy close to the OpenShift Route URL in the toplogy, the route list and detail page. But the Knative Route URL doesn't show this link as Vikram mentioned in this code review https://github.com/openshift/console/pull/12853#issuecomment-1594829827

Version-Release number of selected component (if applicable):
4.13+

How reproducible:
Always

Steps to Reproduce:

Install OpenShift Serverless operator
Import an application as Knative Service
Open the Service in the topology sidebar

Actual results:
Copy button is not shown

Expected results:
Copy button should be displayed

Additional info:

https://github.com/openshift/console/pull/12908

Bug OCPBUGS-15427: Project admins cannot see 'Pipelines' section in 'import from git' from RHOCP4 web console

View the Description View the linked PRs

Description of problem:

As a cluster-admin, users can see pipelines section while using the `import from git` feature in the developer mode from web console.

However if users logged in as a normal user or a project admin, they are not able to see the pipelines section.

Version-Release number of selected component (if applicable):

Tested in OCP v4.12.18 and v4.12.20

How reproducible:

Always

Steps to Reproduce:

Prerequisite- Install Red Hat OpenShift pipelines operator
1. Login as a kube-admin user from web console
2. Go to Developer View
3. Click on +Add
4. Under Git Repository, open page -> Import from git
5. Enter Git Repo URL (example git url- https://github.com/spring-projects/spring-petclinic)
6. Check if there are 3 section : General , Pipelines , Advance options
7. Then Login as a project admin user
8. Perform all the steps again from step 2 to step 6

Actual results:

Pipelines section is not visible when logged in as a project admin. Only General and Advance options sections are visible in import from git.
However Pipeline section is visible as a cluster-admin.

Expected results:

Pipelines section should be visible when logged in as a project admin, along with General and Advance options sections in import from git.

Additional info:

I checked by creating a separate rolebinding and clusterrolebindings to assign access for pipeline resources like below :
~~~
$ oc create clusterrole pipelinerole1 --verb=create,get,list,patch,delete --resource=tektonpipelines,openshiftpipelinesascodes
$ oc create clusterrole pipelinerole2 --verb=create,get,list,patch,delete --resource=repositories,pipelineruns,pipelines
$ oc adm policy add-cluster-role-to-user pipelinerole1 user1
$ oc adm policy add-role-to-user pipelinerole2 user1
~~~
However, even after assigning these rolebindings/clusterrolebinsings to the users , users are not able to see the Pipelines section.

https://github.com/openshift/console/pull/12964

Bug OCPBUGS-18724: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1693

Bug OCPBUGS-21443: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/64

Bug OCPBUGS-14877: Detect that number of configured hosts exceeds the replicas

View the Description View the linked PRs

Description of problem:

https://issues.redhat.com//browse/OCPBUGS-10342 tracked the issue when the number of replicas exceeded the number of hosts. However, it does not detect the case when the number of hosts exceeds the number of replicas as it was not counting the hosts correctly. Fix to detect this case correctly.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Set compute replicas in install-config.yaml
2. Add hosts in agent-config.yaml - 3 with role of master and more than 2 with role of worker.
3. The installation will fail and following error could be seen in the journal 
Jun 12 01:10:57 master-0 start-cluster-installation.sh[3879]: Hosts known and ready for cluster installation (5/3)

Actual results:

No warning regarding the number of configured hosts

Expected results:

A warning about the number of configured hosts not matching the replicas.

Additional info:

https://github.com/openshift/installer/pull/7268

Bug OCPBUGS-10411: Edit deployment don't enable save button if image stream is added

View the Description View the linked PRs

Description of problem:

While creating the deployment, if image stream is added, then while edit-deployment save button will not be enabled until imagestream tag is changed. 

On click of Reload button Save button will be automatically enabled.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Search Deployment under resources
2. create deployment with Image stream
3. edit deployment

Actual results:

On click of edit deployment the save button is disabled on change of any values

Expected results:

On click of edit deployment the save button should be enabled on change of any value

Video Link - https://drive.google.com/file/d/1luqcjQS5Azc0XRjpMNfKKqbXYSc17Rxc/view?usp=share_link

https://github.com/openshift/console/pull/12673

Bug OCPBUGS-12776: GCP XPN Private Cluster fails with no public zone

View the Description View the linked PRs

Description of problem:

When there is no public zone in dns zone, the look up will fail during install. During the installation of a private cluster, there is no need for a public zone.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

FATAL failed to fetch Terraform Variables: failed to generate asset 
"Terraform Variables": failed to get GCP public zone: no matching public
 DNS Zone found

Expected results:

Installation complete

Additional info:

https://github.com/openshift/installer/pull/7134

Bug OCPBUGS-13356: 'vendor' root device hint does not work correctly in ZTP/ABI

View the Description View the linked PRs

When the user specifies the 'vendor' hint, it actually checks for the value of the 'model' hint in the vendor field.

https://github.com/openshift/assisted-service/pull/5197

Bug OCPBUGS-10232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12558

Bug OCPBUGS-13108: log additional host info at warning level

View the Description View the linked PRs

Description of problem:

For https://issues.redhat.com//browse/OCPBUGS-4998, additional logging was added to the wait-for command when the state is in pending-user-action in order to show the particular host errors preventing installation. This additional host info should be added at the WARNING level.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Test this in the same as bug https://issues.redhat.com//browse/OCPBUGS-4998, i.e. by swapping the boot order of the disks
2. When the log message with additional info is logged it is logged at DEBUG level, for example
DEBUG Host master-2 Expected the host to boot from disk, but it booted the installation image - please reboot and fix boot order to boot from disk Virtual_disk 6000c295b246decdbb4f4e691c185fcf (sda, /dev/disk/by-id/wwn-0x6000c295b246decdbb4f4e691c185fcf)INFO cluster has stopped installing... working to recover installation
3. This has now been changed to log at WARNING level
4. In addition multiple messages are logged:
"level=info msg=cluster has stopped installing... working to recover installation". This will change to only log it one time.

Actual results:

Expected results:

1. The message is now logged at WARNING level
2. Only one message for "cluster has stopped installing... working to recover installation" will appear

Additional info:

https://github.com/openshift/installer/pull/7209

Bug OCPBUGS-13860: ControllerConfig fails to sync after Infrastructure (and most likely DNS) embedded objects are updated

View the Description View the linked PRs

Description of problem:

ControllerConfig renders properly until Infrastructure object changes, then:
- 'Kind' and 'APIVersion' are no longer present on the object resulting from a "get" for that object via the lister and
- as a result, the embedded dns and infrastructure objects in ControllerConfig fail to validate 
- this results in ControllerConfig failing to sync

Version-Release number of selected component (if applicable):

4.14 machine-config-operator

How reproducible:

I can reproduce it every time

Steps to Reproduce:

1.Build a 4.14 cluster
2.Update Infrastructure non-destructively, e.g.: oc annotate infrastructure cluster break.the.mco=yep
3.Watch the machine-config-operator pod logs (or oc get co, the error will propagate) to see the validation errors for the new controllerconfig

Actual results:

2023-05-17T20:45:04.627320107Z I0517 20:45:04.627281       1 event.go:285] Event(v1.ObjectReference{Kind:"", Namespace:"", Name:"machine-config", UID:"d52d09f4-f7bb-497a-a5c3-92861aa6796f", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'OperatorDegraded: MachineConfigControllerFailed' Failed to resync 4.14.0-0.ci.test-2023-05-17-193937-ci-op-dcrr8kjq-latest because: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.infra.apiVersion: Required value: must not be empty, spec.infra.kind: Required value: must not be empty, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

Expected results:

machine-config-operator quietly syncs controllerconfig :)

Additional info:

The MCO itself is not doing this. It's not part of resourcemerge or anything like that. It's happening "below" us. 

The short version here is that when using a typed client, the group,version,kind (GVK) gets stripped during decoding because it's redundant (you already know the type). For "top level" objects, it gets put back during an update request automatically, but it doesn't recurse into embedded objects (which Infrastructure and DNS are). So we end up with embedded objects that are missing explicit GVKs and won't validate. 

Why does it only happen after the objects change? We're using a lister, and the lister's "strip-on-decode" behavior seems a little inconsistent. Sometimes the GVK is populated. If you use a direct client "get", the GVK will never be populated. 

There is a lot of history on this behavior, it won't be changed any time soon, here are some entry points: 
- https://github.com/kubernetes/kubernetes/pull/63972
- https://github.com/kubernetes/kubernetes/issues/80609

https://github.com/openshift/machine-config-operator/pull/3713

Bug OCPBUGS-17998: While mirroring nvidia operator with oc-mirror 4.13 version, ImageContentSourcePolicy is not getting created properly

View the Description View the linked PRs

Description of problem:

While mirroring nvidia operator with oc-mirror 4.13 version, ImageContentSourcePolicy is not getting created properly

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create imageset file

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  local:
    path: /home/name/nvidia
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/certified-operator-index:v4.11
    packages:
    - name: nvidia-network-operator

2. mirror to disk using oc-mirror 4.13
$oc-mirror -c imageset.yaml file:///home/name/nvidia/
./oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.13.0-202307242035.p0.gf11a900.assembly.stream-f11a900", GitCommit:"f11a9001caad8fe146c73baf2acc38ddcf3642b5", GitTreeState:"clean", BuildDate:"2023-07-24T21:25:46Z", GoVersion:"go1.19.10 X:strictfipsruntime", Compiler:"gc", Platform:"linux/amd64"}

3. Now generate the manifest

$ oc-mirror --from /home/name/nvidia/ docker://registry:8443 --manifests-only

- mirrors:
    - registry:8443/nvidia/cloud-native
    source: nvcr.io/nvidia

However the correct mapping should be:
    - mirrors:
        - registry/nvidia
      source: nvcr.io/nvidia

4. perform same step with 4.12.0 version you will not hit this issue. 
./oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.12.0-202304241542.p0.g5fc00fe.assembly.stream-5fc00fe", GitCommit:"5fc00fe735d8fb3b6125f358f5d6b9fe726fad10", GitTreeState:"clean", BuildDate:"2023-04-24T16:01:29Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc-mirror/pull/681

Bug OCPBUGS-3860: agent based installer does not have mastersSchedulable parameter

View the Description View the linked PRs

Description of problem:

I have completed to install OCP as 3 masters and 2 workers.
But I was not able to find mastersSchedulable parameter after command below from all files on manafest directory.
$ openshift-install agent create cluster-manifests  --log-level debug --dir kni

And I used the installer this.
https://github.com/openshift/installer/releases/tag/agent-installer-v4.11.0-dev-preview-2

Version-Release number of selected component (if applicable):

How reproducible:

execution the installer

Steps to Reproduce:

1. download the installer
2. openshift-install agent create cluster-manifests  --log-level debug --dir kni

Actual results:

There is no mastersSchedulable parameter

Expected results:

Some file(like cluster-scheduler-02-config.yml) has mastersSchedulable parameter

Additional info:

https://github.com/openshift/installer/pull/7439

Bug OCPBUGS-9831: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27673

Bug ACM-7278: hcp with --secrets-creds provided still requires pull secret

View the Description View the linked PRs

Description of problem:

When we try to create a cluster with --secret-creds, an MCE AWS k8s secret that includes aws-creds, pull secret, and base domain, then the binary should not ask for pull secret. However, it does now after changing from hypershift.

Adding pull secret param will allow the command to continue as expected, though I would think whole point of the secret-creds is to reuse what exists.

 /usr/local/bin/hcp create cluster aws --name acmqe-hc-ad5b1f645d93464c --secret-creds test1-cred --region us-east-1 --node-pool-replicas 1 --namespace local-cluster --instance-type m6a.xlarge --release-image quay.io/openshift-release-dev/ocp-release:4.14.0-ec.4-multi --generate-ssh Output:
  Error: required flag(s) "pull-secret" not set
  required flag(s) "pull-secret" not set

Version-Release number of selected component (if applicable):

2.4.0-DOWNANDBACK-2023-08-31-13-34-02 or mce 2.4.0-137

hcp version openshift/hypershift: 8b4b52925d47373f3fe4f0d5684c88dc8a93368a. Latest supported OCP: 4.14.0

How reproducible:

always

Steps to Reproduce:

download hcp cli from mce
run hcp cluster create aws with valid secret-creds param
...

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3023

Bug OCPBUGS-11448: Multus admission controller must have "hypershift.openshift.io/release-image" annotation when CNO is managed by Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift, it's deployment has "hypershift.openshift.io/release-image" template metadata annotation. The annotation's value is used to track progress of cluster control plane version upgrades. Example:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      generation: 24
      labels:
        hypershift.openshift.io/managed-by: control-plane-operator
      name: cluster-network-operator
      namespace: master-cg319sf10ghnddkvo8j0
    ...
    spec:
      progressDeadlineSeconds: 600
      ...
      template:
        metadata:
          annotations:
            hypershift.openshift.io/release-image: us.icr.io/armada-master/ocp-release:4.12.7-x86_64
            target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
      ...

The same annotation must be set by CNO on multus-admission-controller deployment so that service providers can track its version upgrades as well.

CNO need a code fix to implement this annotation propagation logic.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check deployment template metadata annotations on multus-admission-controller

Actual results:

No "hypershift.openshift.io/release-image" deployment template metadata annotation exists

Expected results:

"hypershift.openshift.io/release-image" annotation must be present

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1770

Bug OCPBUGS-13662: New install-config fields in 4.13 are ignored without warning

View the Description View the linked PRs

The following install-config fields are new in 4.13:

cpuPartitioning
~~platform.baremetal.loadBalancer~~
~~platform.vsphere.loadBalancer~~

These fields are ignored by the agent-based installation method. Until such time as they are implemented, we should print a warning if they are set to non-default values, as we do for other fields that are ignored.

https://github.com/openshift/installer/pull/7218

Bug OCPBUGS-23968: Internal NLB issue (OCPBUGS-9026) causes random failures on HCP private cluster without infra nodes

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23300~~. The following is the description of the original issue:
—
Description of problem:

Actually the issue is same root cause of https://issues.redhat.com/browse/OCPBUGS-9026 but I'd like to open new one since the issue becomes very critical after ROSA using NLB as default since 4.14, HCP(HyperShift) private cluster that without infra nodes is the serious victim because it has worker nodes only and no available workaround for it now.

But if we think we could use the old bug to track the issue, then please close this one.

Version-Release number of selected component (if applicable):

4.14.1
HyperShift Private cluster

How reproducible:

100%

Steps to Reproduce:

1. create ROSA HCP(HyperShift) cluster
2. run qe-e2e-test on this cluster, or curl route from one pod inside the cluster
3.

Actual results:

1. co/console status is flapping since route is intermittently accessible 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.1    True        False         4h56m   Error while reconciling 4.14.1: the cluster operator console is not available


2. check node and router pods running on both worker nodes
$ oc get node
NAME                          STATUS   ROLES    AGE    VERSION
ip-10-0-49-184.ec2.internal   Ready    worker   5h5m   v1.27.6+f67aeb3
ip-10-0-63-210.ec2.internal   Ready    worker   5h8m   v1.27.6+f67aeb3

$ oc -n openshift-ingress get pod -owide
NAME                              READY   STATUS    RESTARTS   AGE    IP           NODE                          NOMINATED NODE   READINESS GATES
router-default-86d569bf84-bq66f   1/1     Running   0          5h8m   10.130.0.7   ip-10-0-49-184.ec2.internal   <none>           <none>
router-default-86d569bf84-v54hp   1/1     Running   0          5h8m   10.128.0.9   ip-10-0-63-210.ec2.internal   <none>           <none>

3. check ingresscontroller LB setting, it uses Internal NLB

spec:
  endpointPublishingStrategy:
    loadBalancer:
      dnsManagementPolicy: Managed
      providerParameters:
        aws:
          networkLoadBalancer: {}
          type: NLB
        type: AWS
      scope: Internal
    type: LoadBalancerService

4. continue to curl the route from a pod inside the cluster
$ oc rsh console-operator-86786df488-w6fks
Defaulted container "console-operator" out of: console-operator, conversion-webhook-server

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
HTTP/1.1 200 OK

sh-4.4$ curl https://console-openshift-console.apps.rosa.ci-rosa-h-d53b.ptk5.p3.openshiftapps.com -k -I
Connection timed out

Expected results:

1. co/console should be stable, curl console route should be always OK.
2. qe-e2e-test should not fail

Additional info:

qe-e2e-test on the cluster:

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/openshift_release/45369/rehearse-45369-periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-stable-aws-rosa-sts-hypershift-sec-guest-prod-private-link-full-f2/1724307074235502592

https://github.com/openshift/console-operator/pull/817

Bug OCPBUGS-12891: TypeError on operand creation page

View the Description View the linked PRs

Description of problem:

we can see TypeErrors on operand creation page

Version-Release number of selected component (if applicable):

cluster-bot cluster 
launch 4.14-ci,openshift/console#12525

How reproducible:

Always

Steps to Reproduce:

1. create mock CRD and CSV files into project 'test'
$ oc project test
$ oc apply -f mock-crd-and-csv.yaml 
customresourcedefinition.apiextensions.k8s.io/mock-k8s-dropdown-resources.test.tectonic.com created
clusterserviceversion.operators.coreos.com/mock-k8s-resource-dropdown-operator created
2. Goes to CR creation page Operators -> Installed Operators -> Mock K8sResourcePrefixOperator -> Mock Resource tab -> click on 'Create MockK8sDropdownResource' button

Actual results:

2. we can see errors

Description:
e is undefined

Component trace: 
g@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:17026
v@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:54359
div
N@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:173048
R@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:173543
_@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:20749
10807/t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:145
4156/t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/create-operand-chunk-b03c5cb69a738de3ba86.min.js:1:22586
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:223444
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:69403
T
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:71448
Suspense
i@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:435931
section
m@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:170312
div
div
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1501506
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:699298
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:219161
div
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:89596
l@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1151500
H<@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:442786
S@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:87:86675
main
div
v@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:466912
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:311348
div
div
c@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:699298
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:219161
div
d@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendor-patternfly-core-chunk-277c96b9c656c5dae20f.min.js:1:89596
Jn@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:185686
t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:854425
5404/t.default@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/quick-start-chunk-0b68859d1eaa39849249.min.js:1:1264
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:223444
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1581508
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
St@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:142700
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
ee@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599747
i@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:809765
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1575685
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1575874
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1573290
te@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1599889
ne<@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1603021
r@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:36:122338
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:69403
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:71448
t@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:21:66008
re@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1603332
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:783751
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:1084331
s@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/main-chunk-4f4f3b36aabdf0eb831f.min.js:1:635039
t.a@https://console-openshift-console.apps.ci-ln-ykgji4b-76ef8.origin-ci-int-aws.dev.rhcloud.com/static/vendors~main-chunk-8e90f77cf4a58a9d5a52.min.js:135:257437
Suspense

Expected results:

2. operand creation form/yaml page should be loaded successfully

Additional info:

mock-crd-and-csv.yaml and screenshot are at https://drive.google.com/drive/folders/1Z432vVMArHLgCgzu5IMGi9_oq3iRtezx

https://github.com/openshift/console/pull/12887

Bug OCPBUGS-18127: Changes to NodePool .spec.platfrom doesn't trigger a rolling upgrade

View the Description View the linked PRs

Description of problem:

Changes to platform fields e.g. aws instance type doesn't trigger a rolling upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hostedCluster with nodepool on AWS
2. Change the instance type field on the nodepool spec.platfrom.aws

Actual results:

Machines are not restarted and the instance type didn't change

Expected results:

Machines are recreated with the new instance type

Additional info:

This is a result of the recent changes to CAPI which introduced in-place propagation to labels and annotations
Soultion:
MachineTemplate name should not be constant and should change with each spec change, so that spec.infraRef in the MachineDeployment is updated and a rolling upgrade is triggered.

Bug OCPBUGS-21564: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/90

Bug OCPBUGS-23208: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/4031

Bug OCPBUGS-9985: TCP DNS Local Preference is not working for Openshift SDN

View the Description View the linked PRs

Description of problem:

DNS Local endpoint preference is not working for TCP DNS requests for Openshift SDN.

Reference code: https://github.com/openshift/sdn/blob/b58a257b896d774e0a092612be250fb9414af5ca/vendor/k8s.io/kubernetes/pkg/proxy/iptables/proxier.go#L999-L1012

This is where the DNS request is short-circuited to the local DNS endpoint if it exists. This is important because DNS local preference protects against another outstanding bug, in which daemonset pods go stale for a few second upon node shutdown (see https://issues.redhat.com/browse/OCPNODE-549 for fix for graceful node shutdown). This appears to be contributing to DNS issues in our internal CI clusters. https://lookerstudio.google.com/reporting/3a9d4e62-620a-47b9-a724-a5ebefc06658/page/MQwFD?s=kPTlddLa2AQ shows large amounts of "dns_tcp_lookup" failures, which I attribute to this bug.

UDP DNS local preference is working fine in Openshift SDN. Both UDP and TCP local preference work fine in OVN. It's just TCP DNS Local preference that is not working Openshift SDN.

Version-Release number of selected component (if applicable):

4.13, 4.12, 4.11

How reproducible:

100%

Steps to Reproduce:

1. oc debug -n openshift-dns
2. dig +short +tcp +vc +noall +answer CH TXT hostname.bind
# Retry multiple times, and you should always get the same local DNS pod.

Actual results:

[gspence@gspence origin]$ oc debug -n openshift-dns
Starting pod/image-debug ...
Pod IP: 10.128.2.10
If you don't see a command prompt, try pressing enter.
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-gzlhm"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-dnbsp"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-gzlhm"

Expected results:

[gspence@gspence origin]$ oc debug -n openshift-dns
Starting pod/image-debug ...
Pod IP: 10.128.2.10
If you don't see a command prompt, try pressing enter.
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"
sh-4.4# dig +short +tcp +vc +noall +answer CH TXT hostname.bind
"dns-default-glgr8"

Additional info:

https://issues.redhat.com/browse/OCPBUGS-488 is the previous bug I opened for UDP DNS local preference not working.

iptables-save from a 4.13 vanilla cluster bot AWS,SDN: https://drive.google.com/file/d/1jY8_f64nDWi5SYT45lFMthE0vhioYIfe/view?usp=sharing

https://github.com/openshift/sdn/pull/518

Bug OCPBUGS-19300: OKD: Implement workaround to allow SNO installations for OKD/FCOS [release-4.14]

View the Description View the linked PRs

Description of problem:

OKD/FCOS uses FCOS as its bootimage, i.e. when booting cluster nodes
the first time during installation. FCOS does not provide tools such
as OpenShift Client (oc) or hyperkube which are used during
single-node cluster installation at first boot (e.g. oc in
bootkube.sh) and thus setup fails.

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/installer/pull/7479

Bug OCPBUGS-6784: SNO cluster deployment failing due to authentication and console CO in degraded state

View the Description View the linked PRs

Description of problem:

SNO installation performed with the assisted-installer failed

Version-Release number of selected component (if applicable):

4.10.32

# oc get co authentication -o yaml
- lastTransitionTime: '2023-01-30T00:51:11Z'
    message: 'IngressStateEndpointsDegraded: No subsets found for the endpoints of
      oauth-server      OAuthServerConfigObservationDegraded: secret "v4-0-config-system-router-certs"
      not found      OAuthServerDeploymentDegraded: 1 of 1 requested instances are unavailable for
      oauth-openshift.openshift-authentication (container is waiting in pending oauth-openshift-58b978d7f8-s6x4b
      pod)      OAuthServerRouteEndpointAccessibleControllerDegraded: secret "v4-0-config-system-router-certs"

# oc logs ingress-operator-xxx-yyy -c ingress-operator 
2023-01-30T08:14:13.701799050Z 2023-01-30T08:14:13.701Z ERROR   operator.certificate_publisher_controller       certificate-publisher/controller.go:80  failed to list ingresscontrollers for secret    {"related": "", "error": "Index with name field:defaultCertificateName does not exist"}

Restarting the ingress-operator pod helped fix the issue, but a permanent fix is required.

The Bug(https://bugzilla.redhat.com/show_bug.cgi?id=2005351) was filed earlier but closed due to inactivity.

https://github.com/openshift/cluster-ingress-operator/pull/913

Bug OCPBUGS-7836: The MCD has a non-functional pivot command that should be deprecated

View the Description View the linked PRs

Description of problem:

The MCDaemon has a codepath for "pivot" used in older versions, and then as part of solutions articles to initiate a direct pivot to an ostree version, mostly used when things fail.

As of 4.12 this codepath should no longer work due to us switching to new format OSImage, so we should fully deprecate it.

This is likely where it fails:
https://github.com/openshift/machine-config-operator/blob/ecc6bf3dc21eb33baf56692ba7d54f9a3b9be1d1/pkg/daemon/rpm-ostree.go#L248

Version-Release number of selected component (if applicable):

4.12+

How reproducible:

Not sure but should be 100%

Steps to Reproduce:

1. Follow https://access.redhat.com/solutions/5598401
2.
3.

Actual results:

fails

Expected results:

MCD telling you pivot is deprecated

Additional info:

https://github.com/openshift/machine-config-operator/pull/3666

Story HOSTEDCP-1133: NodePools should signal rolling upgrade because of platform changes

View the Description View the linked PRs

User Story:

When changing platform fields e.g. aws instance type we trigger a rolling upgrade, however nothing is signalled in the NodePool state which result in bad UX.

NodePools should signal rolling upgrade because of platform changes.

Acceptance Criteria:

Description of criteria:

Upstream documentation
Point 1
Point 2
Point 3

(optional) Out of Scope:

Detail about what is specifically not being delivered in the story

Engineering Details:

(optional) https://github/com/link.to.enhancement/
(optional) https://issues.redhat.com/link.to.spike
Engineering detail 1
Engineering detail 2

This requires/does not require a design proposal.
This requires/does not require a feature gate.

https://github.com/openshift/hypershift/pull/2973

Bug OCPBUGS-10106: Update 4.14 configmap-reload image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/configmap-reload/pull/51

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/configmap-reload/pull/51

Bug OCPBUGS-13253: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13061~~.

https://github.com/openshift/installer/pull/7176

Bug OCPBUGS-18881: Failure when creating operator-backed resources

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18439~~. The following is the description of the original issue:
—
Description of problem:

In the developer sandbox, the happy path to create operator-backed resources is broken.

Users can only work on their assigned namespace. When doing so, and attempting to create an Operator-backed resource from the Developer console, the user interface switches inadvertendly the working namespace from the user's to the `openshift` one. The console shows an error message when the user clicks the "create" button.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Login to the Developer Sandbox
2. Choose the Developer view
3. Click Add+ -> Developer Catalog -> Operator Backed
4. Filter by "integration"
5. Notice the working namespace is still the user's one. 
6. Select "Integration" (Camel K operator)
7. Click "Create"
8. Notice the working namespace has switched to `openshift`
9. Notice the custom resource in YAML view includes `namespace: openshift`
10. Click "Create"

Actual results:

An error message shows: "Danger alert:An error occurredintegrations.camel.apache.org is forbidden: User "bmesegue" cannot create resource "integrations" in API group "camel.apache.org" in the namespace "openshift""

Expected results:

On step 8, the working directory should remain the user's one
On step 9, in the YAML view, the namespace should be the user's one, or none.
After step 10, the creation process should trigger the creation of a Camel K integration.

Additional info:

https://github.com/openshift/console/pull/13150

Bug OCPBUGS-12483: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/191

Bug OCPBUGS-14612: Improve logging for KNI haproxy

View the Description View the linked PRs

Improve logging format of KNI haproxy logs to display tcplogs + frondend IP and frontend port.

The current logging format is not very verbose:

<134>Jun  2 22:54:02 haproxy[11]: Connect from ::1:42424 to ::1:9445 (main/TCP)
<134>Jun  2 22:54:04 haproxy[11]: Connect from ::1:42436 to ::1:9445 (main/TCP)
<134>Jun  2 22:54:04 haproxy[11]: Connect from ::1:42446 to ::1:9445 (main/TCP)

It lacks critical information for troubleshooting, such as load-balancing destination and timestamps.
https://www.haproxy.com/blog/introduction-to-haproxy-logging recommends the following for tcp mode:

When in TCP mode, which is set by adding mode tcp, you should also add [option tcplog](https://www.haproxy.com/documentation/hapee/1-8r1/onepage/#option%20tcplog).

https://github.com/openshift/machine-config-operator/pull/3725

Task MGMT-15114: Failed to register events are filling up stage DB

View the Description View the linked PRs

We should garbage collect also failed to register events and possibly other orphaned events

https://github.com/openshift/assisted-service/pull/5330

Bug OCPBUGS-14038: APIRemovedInNextRelease needs to be updated post kube bump

View the Description View the linked PRs

After kube was bumped in cluster-kube-apiserver-operator the alert needs to use next Kubernetes version in promql query

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1497

Bug OCPBUGS-14169: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2616

Bug OCPBUGS-18065: [Seems a release blocker] 4.14 nightly HyperShift hosted cluster aws-pod-identity does not work

View the Description View the linked PRs

Description of problem:

4.14 nightly HyperShift hosted cluster aws-pod-identity does not work. Pods are not injected env vars AWS_ROLE_ARN and AWS_WEB_IDENTITY_TOKEN_FILE.

In 4.13 HyperShift hosted cluster, it works well, see Additional info.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1.
$ export KUBECONFIG=/path/to/hypershift-hosted-cluster/kubeconfig
$ ogcv
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-08-11-055332   True        False         8h      Cluster version is 4.14.0-0.nightly-2023-08-11-055332
$ oc get mutatingwebhookconfigurations --context admin
NAME               WEBHOOKS   AGE
aws-pod-identity   1          6h5m

$ oc get --raw=/.well-known/openid-configuration | jq -r '.issuer'
https://xxxx.s3.us-east-2.amazonaws.com/hypershift-xxxx

2.
$ oc new-project xxia-proj
$ oc create sa aws-provider
serviceaccount/aws-provider created

3.
$ ccoctl aws create-iam-roles --name=xxia --region=$REGION --credentials-requests-dir=credentialsrequest-dir-aws --identity-provider-arn=arn:aws:iam::xxxx:oidc-provider/xxxx.s3.us-east-2.amazonaws.com/hypershift-xxxx --output-dir=credrequests-ccoctl-output
2023/08/24 17:54:32 Role arn:aws:iam::xxxx:role/xxia-xxia-proj-aws-creds created
2023/08/24 17:54:32 Saved credentials configuration to: credrequests-ccoctl-output/manifests/xxia-proj-aws-creds-credentials.yaml
2023/08/24 17:54:32 Updated Role policy for Role xxia-xxia-proj-aws-creds

4.
$ oc annotate sa/aws-provider eks.amazonaws.com/role-arn="arn:aws:iam::xxxx:role/xxia-xxia-proj-aws-creds"
$ oc create deployment aws-cli --image=amazon/aws-cli --dry-run=client -o yaml -- sleep 360d | sed "/containers/i \      serviceAccountName: aws-provider" | oc create -f -
deployment.apps/aws-cli created
$ oc get po
NAME                               READY   STATUS              RESTARTS   AGE
aws-cli-5c4f6d7d5b-g6d5v           1/1     Running             0          18s

5.
$ oc rsh aws-cli-5c4f6d7d5b-g6d5v
sh-4.2$ env | grep AWS
sh-4.2$ ls /var/run/secrets/eks.amazonaws.com/serviceaccount/token
ls: cannot access /var/run/secrets/eks.amazonaws.com/serviceaccount/token: No such file or directory
sh-4.2$ exit
command terminated with exit code 1

Actual results:

5. No AWS env vars.

Expected results:

5. Should have AWS env vars.

Additional info:

In 4.13 HyperShift hosted cluster, it works well:

1.
$ ogcv    
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-08-11-101506   True        False         10h     Cluster version is 4.13.0-0.nightly-2023-08-11-101506
$ oc get --raw=/.well-known/openid-configuration | jq -r '.issuer'
https://aos-xxxx.s3.us-east-2.amazonaws.com/xxxx
$ oc get no                       
NAME                                        STATUS   ROLES    AGE   VERSION
ip-10-0-139-76.us-east-2.compute.internal   Ready    worker   10h   v1.26.6+6bf3f75
...
$ REGION=us-east-2

2.
$ oc new-project xxia-proj
$ oc create sa aws-provider

3.
$ ccoctl aws create-iam-roles --name=xxia-test --region=$REGION --credentials-requests-dir=credentialsrequest-dir-aws --identity-provider-arn=arn:aws:iam::xxxx:oidc-provider/aos-xxxx.s3.us-east-2.amazonaws.com/xxxx --output-dir=credrequests-ccoctl-output
2023/08/24 20:06:53 Role arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds created 
2023/08/24 20:06:53 Saved credentials configuration to: credrequests-ccoctl-output/manifests/xxia-proj-aws-creds-credentials.yaml
2023/08/24 20:06:53 Updated Role policy for Role xxia-test-xxia-proj-aws-creds

4.
$ oc annotate sa/aws-provider eks.amazonaws.com/role-arn="arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds"
$ oc create deployment aws-cli --image=amazon/aws-cli --dry-run=client -o yaml -- sleep 360d | sed "/containers/i \      serviceAccountName: aws-provider" | oc create -f -
$ oc get pod               
NAME                       READY   STATUS    RESTARTS   AGE
aws-cli-84875995cc-svszl   1/1     Running   0          16s

5.
$ oc rsh aws-cli-84875995cc-svszl
sh-4.2$ env | grep AWS
AWS_ROLE_ARN=arn:aws:iam::xxxx:role/xxia-test-xxia-proj-aws-creds
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=us-east-2
AWS_REGION=us-east-2

https://github.com/openshift/hypershift/pull/2957

Bug OCPBUGS-20150: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-apiserver/pull/393

Bug OCPBUGS-9956: update the default pipelineRun template name

View the Description View the linked PRs

Description of problem:

PipelineRun default template name has been updated in the backend in Pipeline operator 1.10, So we need to update the name in the UI code as well.

https://github.com/openshift/console/blob/master/frontend/packages/pipelines-plugin/src/components/pac/const.ts#L9

https://github.com/openshift/console/pull/12660

Bug OCPBUGS-10022: Authorization with ClusterRoleBinding not working as expected when using system:serviceaccounts

View the Description View the linked PRs

Description of problem:

Authorization by OpenShift Container Platform 4 is not working as expected, when using system:serviceaccounts Group in the ClusterRoleBinding.

Here, one would assume that every serviceAccount would be granted the permissions to access the defined resources but actually access is denied.

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

When specifying the serviceAccount in the ClusterRoleBinding access is granted:

$ oc get clusterrolebinding shared-secret-cluster-role-binding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"shared-secret-cluster-role-binding"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"shared-secret-cluster-role"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:serviceaccounts"}]}
  creationTimestamp: "2023-03-13T08:59:46Z"
  name: shared-secret-cluster-role-binding
  resourceVersion: "1575464"
  uid: dd11825d-834a-4807-ab82-30dc0a415985
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: shared-secret-cluster-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- kind: ServiceAccount
  name: builder
  namespace: project-101

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:16:47Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-101",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-101:builder"
  },
  "status": {
    "allowed": true,
    "reason": "RBAC: allowed by ClusterRoleBinding \"shared-secret-cluster-role-binding\" of ClusterRole \"shared-secret-cluster-role\" to ServiceAccount \"builder/project-101\""
  }
}

Both namespaces exist and have the serviceAccount automatically created.

$ oc get sa -n project-100
NAME       SECRETS   AGE
builder    1         11m
default    1         11m
deployer   1         11m

$ oc get sa -n project-101
NAME       SECRETS   AGE
builder    1         4m1s
default    1         4m1s
deployer   1         4m

The difference is only how authorization is granted. For project-101 the serviceAccount is explicitly granted while for project-100 authorization should be granted via Group called system:serviceaccounts

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.5

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.12
2. Create SharedSecret CRD using oc apply -f https://raw.githubusercontent.com/openshift/api/master/sharedresource/v1alpha1/0000_10_sharedsecret.crd.yaml
3. Create SharedSecret resource:
$ oc get sharedsecret shared-subscription -o yaml
apiVersion: sharedresource.openshift.io/v1alpha1
kind: SharedSecret
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"sharedresource.openshift.io/v1alpha1","kind":"SharedSecret","metadata":{"annotations":{},"name":"shared-subscription"},"spec":{"secretRef":{"name":"etc-pki-entitlement","namespace":"openshift-config-managed"}}}
  creationTimestamp: "2023-03-13T08:54:48Z"
  generation: 1
  name: shared-subscription
  resourceVersion: "1567499"
  uid: 15c350aa-0de1-4a02-b876-9b822ba0afe5
spec:
  secretRef:
    name: etc-pki-entitlement
    namespace: openshift-config-managed
4. Create ClusterRole to grant access to SharedSecret:
$ oc get clusterrole shared-secret-cluster-role -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRole","metadata":{"annotations":{},"name":"shared-secret-cluster-role"},"rules":[{"apiGroups":["sharedresource.openshift.io"],"resourceNames":["shared-subscription"],"resources":["sharedsecrets"],"verbs":["use"]}]}
  creationTimestamp: "2023-03-13T08:57:24Z"
  name: shared-secret-cluster-role
  resourceVersion: "1568481"
  uid: 99324722-ac62-4bb8-a7fe-7ac915393e19
rules:
- apiGroups:
  - sharedresource.openshift.io
  resourceNames:
  - shared-subscription
  resources:
  - sharedsecrets
  verbs:
  - use
5. Create ClusterRoleBinding to access SharedSecret
$ oc get clusterrolebinding shared-secret-cluster-role-binding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"rbac.authorization.k8s.io/v1","kind":"ClusterRoleBinding","metadata":{"annotations":{},"name":"shared-secret-cluster-role-binding"},"roleRef":{"apiGroup":"rbac.authorization.k8s.io","kind":"ClusterRole","name":"shared-secret-cluster-role"},"subjects":[{"apiGroup":"rbac.authorization.k8s.io","kind":"Group","name":"system:serviceaccounts"}]}
  creationTimestamp: "2023-03-13T08:59:46Z"
  name: shared-secret-cluster-role-binding
  resourceVersion: "1575464"
  uid: dd11825d-834a-4807-ab82-30dc0a415985
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: shared-secret-cluster-role
subjects:
- apiGroup: rbac.authorization.k8s.io
  kind: Group
  name: system:serviceaccounts
- kind: ServiceAccount
  name: builder
  namespace: project-101
6. Run SubjectAccessReview call to validate authoriztion:
$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

Actual results:

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:17:45Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-100",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-100:builder"
  },
  "status": {
    "allowed": false
  }
}

Expected results:

$ curl -k -X POST -H "Content-Type: application/json" -H "Authorization: Bearer <token>" --data "@/tmp/post.json" https://api.<url>:6443/apis/authorization.k8s.io/v1/subjectaccessreviews
{
  "kind": "SubjectAccessReview",
  "apiVersion": "authorization.k8s.io/v1",
  "metadata": {
    "creationTimestamp": null,
    "managedFields": [
      {
        "manager": "curl",
        "operation": "Update",
        "apiVersion": "authorization.k8s.io/v1",
        "time": "2023-03-13T09:16:47Z",
        "fieldsType": "FieldsV1",
        "fieldsV1": {
          "f:spec": {
            "f:resourceAttributes": {
              ".": {},
              "f:group": {},
              "f:name": {},
              "f:namespace": {},
              "f:resource": {},
              "f:verb": {}
            },
            "f:user": {}
          }
        }
      }
    ]
  },
  "spec": {
    "resourceAttributes": {
      "namespace": "project-101",
      "verb": "use",
      "group": "sharedresource.openshift.io",
      "resource": "sharedsecrets",
      "name": "shared-subscription"
    },
    "user": "system:serviceaccount:project-101:builder"
  },
  "status": {
    "allowed": true,
    "reason": "RBAC: allowed by ClusterRoleBinding \"shared-secret-cluster-role-binding\" of ClusterRole \"shared-secret-cluster-role\" to ServiceAccount \"builder/project-101\""
  }
}

Additional info:

The goal is to use the Group "system:serviceaccounts" to authorize all serviceAccounts to access the given resources to avoid listing all namespaces specifically and thus have the need to create a controller that needs to update a list or similar.

https://github.com/openshift/csi-driver-shared-resource/pull/130

Bug OCPBUGS-13535: AdditionalTrustBundle is only included when doing mirroring

View the Description View the linked PRs

Description of problem:

The AdditionalTrustBundle field in install-config.yaml can be used to add additional certs, however these certs are only propagated to the final image when the ImageContentSources field is also set for mirroring. If mirroring is not set then the additional certs will be on the bootstrap but not the final image.

This can cause a problem when user has set up a proxy and wants to add additional certs as described here https://docs.openshift.com/container-platform/4.12/networking/configuring-a-custom-pki.html#installation-configure-proxy_configuring-a-custom-pki

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. In install-config.yaml set additionalTrustBundle and don't set imageContentSources.
2. Do an installation using the install-config.yaml.
3. After the final image is installed and rebooted view the certs in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt.

Actual results:

The certs defined in additionalTrustBundle are not in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt.

Expected results:

The certs defined in additionalTrustBundle will be in /etc/pki/ca-trust/source/anchors/openshift-config-user-ca-bundle.crt even when imgeContentSources are not defined.

Additional info:

https://github.com/openshift/installer/pull/7182

Bug OCPBUGS-15825: agent-gather does not collect agent-tui logs

View the Description View the linked PRs

Description of problem:

agent-gather script does not collect agent-tui logs

Version-Release number of selected component (if applicable):

How reproducible:

Login into a node (before bootstrap is completed), and run agent-gather script

Steps to Reproduce:

1. ssh into one of the node
2. run agent-gather
3. Check the content of the produced tar artifacts

Actual results:

The agent-gather-*.tar.xz does not contain agent-tui logs

Expected results:

The agent-gather-*.tar.xz must contain /var/log/agent/agent-tui.log

Additional info:

agent-tui logs are fundamental to troubleshoot any eventual issue that could happen during the bootstrap, affecting the agent-tui console.

https://github.com/openshift/installer/pull/7293

Bug OCPBUGS-20418: /sysroot mountpoint failed to resize automatically on new nodes during machineset scaleup

View the Description View the linked PRs

Description of problem:
New machines got stuck in Provisioned state when the customer tried to scale the machineset.
~~~
NAME PHASE TYPE REGION ZONE AGE
ocp4-ftf8t-worker-2-wn6lp Provisioned 44m
ocp4-ftf8t-worker-redhat-x78s5 Provisioned 44m
~~~

Upon checking the journalctl logs from these VMs, we noticed that it was failing with "no space left on the device" errors while pulling images.

To troubleshoot the issue further we had to break root password in order to login and check the issue further.

Once root password was broken, we logged in to the system and check journalctl logs for failure errors.
We could see "no space left of device" for image pulls. Checking df -h output we could see /dev/sda4 (/dev/mapper/coreos-luks-root-nocrypt) which is mounted on /sysroot was 100% full.
As image would fail to get pulled, the machine-config-daemon-firstboot.service will not get completed. This would not allow us to get the node to 4.12, nor be part of the cluster.
The rest of the errors were side effect of the "no space left on device" error.
We could see that the /dev/sda4 was correctly partitioned to 120Gib. We compared to the working system and partition scheme matched.
The filesystem was only of 2.8 Gib instead of 120 Gib.
We manually extended the filesystem for / (xfs_growfs /) after which / mount was resized to 120Gib.
The node got rebooted once this step was performed and system came up fine with 4.12 Red Hat Coreos.
We waited for a while for the node to come up with kubelet and crio running, approved the certs and now the node is part of the cluster.

Later while checking the logs for RCA, we observed below errors from the logs which might help in determining why the sysroot mountpoint was not resized.
~~~
$ grep ~~i growfs sos_commands/logs/journalctl_no-pager_~~-since_-3days
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Failed to load configuration: No such file or directory <---
Jun 12 10:37:30 ocp4-ftf8t-worker-2-wn6lp systemd[1]: ignition-ostree-growfs.service: Collecting.
~~~

Version-Release number of selected component (if applicable):
OCP 4.12.18.
IPI installation on RHV.

How reproducible:
Not able to reproduce the issue.

Steps to Reproduce:

1.
2.
3.

Actual results:
The /sysroot mountpoint was not resized to the actual size of the /dev/sda4 partition which further prevented the machine-config-daemon-firstboot.service from completing and the node was stuck at RHCOS version 4.6.

Currently the customer has to manually resize the /sysroot mountpoint everytime he adds a new node in the cluster as a workaround.

Expected results:
The /sysroot mountpoint should be automatically resized as a part of ignition-ostree-growfs.sh script.

Additional info:
The customer has recently migrated from old storagedomain to a new one on RHV if that matters? However they performed successful machineset scaleup tests with the new storagedomain on OCP 4.11.33 (before upgrading OCP).
They started facing issue with all the machinesets (new/existing) only after they upgraded the OCP version to 4.12.18.

https://github.com/openshift/machine-config-operator/pull/3967

Bug OCPBUGS-17515: Console UI is broken due to patternfly/react-core version change

View the Description View the linked PRs

Console UI is broken due to patternfly/react-core version changed to
4.276.11 from 4.276.8

https://github.com/openshift/console/pull/13086

Bug OCPBUGS-22295: Unresponsive server API in ipv6 disconnected agent-based hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20246~~. The following is the description of the original issue:
—
Description of problem:

Installing ipv6 agent-based hosted cluster in disconnected environment. The hosted control plane is available but when using its kubeconfig to run oc commands on the hosted cluster, I'm getting 

E1009 08:05:34.000946  115216 memcache.go:265] couldn't get current server API group list: Get "https://fd2e:6f44:5dd8::58:31765/api?timeout=32s": dial tcp [fd2e:6f44:5dd8::58]:31765: i/o timeout

Version-Release number of selected component (if applicable):

OCP 4.14.0-rc.4

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

I can use oc commands against the hosted cluster

Additional info:

https://github.com/openshift/hypershift/pull/3117

Bug OCPBUGS-26574: CPO Failing to delete default worker security group, but not reflected in HostedCluster status condition

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26412~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-23362. The following is the description of the original issue:
—

A hostedcluster/hostedcontrolplane were stuck uninstalling. Inspecting the CPO logs, it showed that

"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

Unfortunately, I do not have enough access to the AWS account to inspect this security group, though I know it is the default worker security group because it's recorded in the hostedcluster .status.platform.aws.defaultWorkerSecurityGroupID

Version-Release number of selected component (if applicable):

4.14.1

How reproducible:

I haven't tried to reproduce it yet, but can do so and update this ticket when I do. My theory is:

Steps to Reproduce:

1. Create an AWS HostedCluster, wait for it to create/populate defaultWorkerSecurityGroupID
2. Attach the defaultWorkerSecurityGroupID to anything else in the AWS account unrelated to the HCP cluster
3. Attempt to delete the HostedCluster

Actual results:

CPO logs:
"error": "failed to delete AWS default security group: failed to delete security group sg-04abe599e5567b025: DependencyViolation: resource sg-04abe599e5567b025 has a dependent object\n\tstatus code: 400, request id: f776a43f-8750-4f04-95ce-457659f59095"

HostedCluster Status Condition
  - lastTransitionTime: "2023-11-09T22:18:09Z"
    message: ""
    observedGeneration: 3
    reason: StatusUnknown
    status: Unknown
    type: CloudResourcesDestroyed

Expected results:

I would expect that the CloudResourcesDestroyed status condition on the hostedcluster would reflect this security group as holding up the deletion instead of having to parse through logs.

Additional info:

https://github.com/openshift/hypershift/pull/3398

Bug OCPBUGS-17347: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13058

Bug OCPBUGS-19380: Hide the Builds NavItem if BuildConfig is not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18464~~. The following is the description of the original issue:
—
Description of problem:

Hide the Builds NavItem if BuildConfig is not installed in the cluster

https://github.com/openshift/console/pull/13167

Bug OCPBUGS-20789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/298

Bug OCPBUGS-8457: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/65

Bug OCPBUGS-9063: OLM Form view configuration needs some fields to be filled even though they are optional

View the Description View the linked PRs

Description of problem:

When deploying KafkaMirrorMaker through OLM form (in AMQ Streams and Strimzi operator) we have to specify fields, which already have defaults and are optional:

Liveness Probe
Readiness Probe
Tracing

For all other components it's correct.

Version-Release number of selected component (if applicable):

4.6
4.7
4.8
4.9

How reproducible:

Steps to Reproduce:
1. Deploy Strimzi 0.27.0 or AMQ Streams 1.8.4 via OLM
2. Try to deploy KafkaMirrorMaker via Form view without any changes

Actual results:
CR cannot be created because several required fields (all are in Liveness probe, Readiness probe and Tracing part) are not filled.

Expected results:
CR will be created, because all required fields are set (whitelist/include, kafka bootstrap address and replicas count, nothing else is needed)

Additional info:

https://github.com/openshift/console/pull/12788

Bug OCPBUGS-11072: Egress firewall node selector test missing

View the Description View the linked PRs

Dummy bug to track adding the test to openshift/origin.

https://github.com/openshift/origin/pull/27824

Bug OCPBUGS-13003: Rebase vSphere CSI driver to 3.0.1

View the Description View the linked PRs

The 3.0.1 version seems to have some important fixes about vsphere CSI driver crashes. We should backport those fixes to 4.13 and 4.14

https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/3.0/rn/vmware-vsphere-container-storage-plugin-30-release-notes/index.html

https://github.com/openshift/vmware-vsphere-csi-driver/pull/76

Bug OCPBUGS-17418: CVO should not panic when openshift-config* ConfigMaps are deleted while a watcher is down

View the Description View the linked PRs

Description of problem:

CVO is observing panic and throwing following error

Interface conversion: cache.DeletedFinalStateUnknown is not v1.Object: missing method GetAnnotations

Linking the job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960 

Observed on other jobs https://search.ci.openshift.org/?search=cache.DeletedFinalStateUnknown+is+not+v1.Object&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-25982: [4.15] E2E Automation of Dynamic OVS Pinning

View the Description View the linked PRs

This is a clone of issue OCPBUGS-20368. The following is the description of the original issue:
—
Description of problem:

Automate E2E tests of Dynamic OVS Pinning. This bug is created for merging

https://github.com/openshift/cluster-node-tuning-operator/pull/746

Version-Release number of selected component (if applicable):

4.15.0

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-node-tuning-operator/pull/904

Bug OCPBUGS-10073: Update 4.14 ose-gcp-cluster-api-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-api-provider-gcp/pull/193

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-api-provider-gcp/pull/193

Bug OCPBUGS-10875: [GWAPI] dns_controller error "failed to publish DNS record to zone" (gcp)

View the Description View the linked PRs

Description of problem:

Error message seen during testing:
2023-03-23T22:33:02.507Z	ERROR	operator.dns_controller	dns/controller.go:348	failed to publish DNS record to zone	{"record": {"dnsName":"*.example.com","targets":["34.67.189.132"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"ci-ln-95xvtb2-72292-9jj4w-private-zone"}, "error": "googleapi: Error 400: Invalid value for 'entity.change.additions[*.example.com][A].name': '*.example.com', invalid"}

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Setup 4.13 gcp cluster, install OSSM using http://pastebin.test.redhat.com/1092754
2. Run gateway api e2e against cluster (or create gateway with listener hostname *.example.com)
3. Check ingress operator logs

Actual results:

DNS record not published, and continous error in log

Expected results:

Should publish DNS record to zone without errors

Additional info:

Miciah: The controller should check ManageDNSForDomain when calling EnsureDNSRecord.

https://github.com/openshift/cluster-ingress-operator/pull/934

Bug OCPBUGS-19496: cluster-autoscaler-operator clusterrole needs watch on clusteroperators

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19411~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.oc -n openshift-machine-api get role/cluster-autoscaler-operator -o yaml
2. Observe missing watch verb
3. Tail cluster-autoscaler logs to see error

status.go:444] No ClusterAutoscaler. Reporting available.
I0919 16:40:52.877216       1 status.go:244] Operator status available: at version 4.14.0-rc.1
E0919 16:40:53.719592       1 reflector.go:148] github.com/openshift/client-go/config/informers/externalversions/factory.go:101: Failed to watch *v1.ClusterOperator: unknown (get clusteroperators.config.openshift.io)

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-autoscaler-operator/pull/288

Bug OCPBUGS-9949: create image command erroneously logs that Base ISO was obtained from release

View the Description View the linked PRs

Description of problem:

When creating an image for arm, i.e. using:
  architecture: arm64

and running
$ ./bin/openshift-install agent create image --dir ./cluster-manifests/ --log-level debug

the output indicates the the correct base iso was extracted from the release:
INFO Extracting base ISO from release payload     
DEBUG Using mirror configuration                   
DEBUG Fetching image from OCP release (oc adm release info --image-for=machine-os-images --insecure=true --icsp-file=/tmp/icsp-file347546417 registry.ci.openshift.org/origin/release:4.13) 
DEBUG extracting /coreos/coreos-aarch64.iso to /home/bfournie/.cache/agent/image_cache, oc image extract --path /coreos/coreos-aarch64.iso:/home/bfournie/.cache/agent/image_cache --confirm --icsp-file=/tmp/icsp-file3609464443 registry.ci.openshift.org/origin/4.13-2023-03-09-142410@sha256:e3c4445cabe16ca08c5b874b7a7c9d378151eb825bacc90e240cfba9339a828c 
INFO Base ISO obtained from release and cached at /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso 
DEBUG Extracted base ISO image /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso from release payload 

When in fact the ISO was not extracted from the release image and the command failed:
ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors 
FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": provided device /home/bfournie/.cache/agent/image_cache/coreos-aarch64.iso does not exist

Version-Release number of selected component (if applicable):

4.13

How reproducible:

every time

Steps to Reproduce:

1. Set architecture: arm64  for all hosts in install-config.yaml 
2. Run the openshift-install command as above
3. See the log messages and the command fails

Actual results:

Invalid messages are logged and command fails

Expected results:

Command succeeds

Additional info:

https://github.com/openshift/installer/pull/6960

Task MGMT-15424: Parametrize envoy configmap name

View the Description View the linked PRs

We want to parametrize envoy configmap name: with that, we can configure a private envoy configuration that would bring the following advantages:

private infra details
changing envoy config can be done with app-interface MR only

https://github.com/openshift/assisted-service/pull/5411

Bug OCPBUGS-12732: Create BuildConfig button in the Devconsole opens the form in default namespace

View the Description View the linked PRs

Description of problem:

Create BuildConfig button in the Dev console builds opens the form view but in default namespace

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Goto Dev Perspective
2. Click on Builds
3. Click on "Create BuildConfig"

Actual results:

"default" namespace is selected in the namespace selector

Expected results:

It should open the form in the active namespace

Additional info:

https://github.com/openshift/console/pull/12771

Bug OCPBUGS-497: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12928

Bug OCPBUGS-10937: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1752

Bug OCPBUGS-13300: Converge the masters to use only one ServerGroup

View the Description View the linked PRs

Description of problem:

Currrently, only one ServerGroup is created in OpenStack when 3 masters on 3 AZs are deployed while 3 should have been created (one per AZ). With the work on CPMS, we made the decision to only create one ServerGroup for the masters. However, this will require a change in the installer to reflect this decision.
Indeed, when specifying AZs, the master machines would have their own ServerGroup, while only one actually existed in OpenStack. This was a mistake but instead of fixing that bug, we'll change the behaviour to have only one ServerGroup for masters.

Version-Release number of selected component (if applicable):

latest (4.14)

How reproducible: deploy a control plane with 3 failure domains:

controlPlane:
  name: master
  platform:
    openstack:
      type: m1.xlarge
      failureDomains:
      - computeAvailabilityZone: az0
      - computeAvailabilityZone: az1
      - computeAvailabilityZone: az2

Steps to Reproduce:

1. Deploy the control plane in 3 AZ
2. List OpenStack Compute Server Groups

Actual results:

+--------------------------------------+--------------------------+--------------------+
| ID                                   | Name                     | Policy             |
+--------------------------------------+--------------------------+--------------------+
| 0750c579-d2cf-41b3-9e88-003dcbcad0c5 | refarch-jkn8g-master-az0 | soft-anti-affinity |
| 05715c08-ac2b-439d-9bd5-5803ac40c322 | refarch-jkn8g-worker     | soft-anti-affinity |
+--------------------------------------+--------------------------+--------------------+

Expected results without our work on CPMS:

refarch-jkn8g-master-az1 and refarch-jkn8g-master-az2 should have been created.

This expectation is purely for documentation, QE should ignore it.

Expected results with our work on CPMS (which should be taken in account by QE when testing CPMS):

refarch-jkn8g-master-az0 should not exist, and the ServerGroup should be named refarch-jkn8g-master.
All the masters should use that ServerGroup in both the Nova instance properties and in the MachineSpec once the machines are enrolled by CCPMSO.

https://github.com/openshift/installer/pull/7172

Bug OCPBUGS-14998: [AWS Shared VPC] ingress operator failed to get hosted zone in private cluster

View the Description View the linked PRs

Description of problem:

Create a private Shared VPC cluster on AWS, Ingress operator degraded due to the following error:

2023-06-14T09:55:50.240Z	INFO	operator.dns_controller	controller/controller.go:118	reconciling	{"request": {"name":"default-wildcard","namespace":"openshift-ingress-operator"}}
2023-06-14T09:55:50.363Z	ERROR	operator.dns_controller	dns/controller.go:354	failed to publish DNS record to zone	{"record": {"dnsName":"*.apps.ci-op-2x6lics3-849ce.qe.devcluster.openshift.com.","targets":["internal-ac656ce4d29f64da289152053f50c908-1642793317.us-east-1.elb.amazonaws.com"],"recordType":"CNAME","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"Z0698684SM2RRJSYHP43"}, "error": "failed to get hosted zone for load balancer target \"internal-ac656ce4d29f64da289152053f50c908-1642793317.us-east-1.elb.amazonaws.com\": couldn't find hosted zone ID of ELB internal-ac656ce4d29f64da289152053f50c908-1642793317.us-east-1.elb.amazonaws.com"}


ingress operator:
ingress                                                                         False       True          True       37m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DNSReady=False (FailedZones: The record failed to provision in some zones: [{Z0698684SM2RRJSYHP43 map[]}])

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

always

Steps to Reproduce:

1. Create a private Shared VPC cluster on AWS using STS

Actual results:

ingress operator degraded

Expected results:

cluster is healthy

Additional info:

public cluster no such issue.

https://github.com/openshift/cluster-ingress-operator/pull/951

Task MON-3216: Add ownership labels to CMO kube resources

View the Description View the linked PRs

As discussed in https://issues.redhat.com/browse/MON-1634, adding ownerref will be put on hold for now until CMO has a CR.

In the meantime we'll add (let's hope temporary) labels to emphasize ownership, this will help guide users for now and help us highlight relations and how we can/want to express them using ownerref in the future. (See option 1 and option 2 in the doc above)

https://github.com/openshift/cluster-monitoring-operator/pull/1986

Bug OCPBUGS-10551: 4.13 uses pre-release of vSphere CSI driver

View the Description View the linked PRs

Description of problem:

OCP 4.13 uses a release candidate v3.0.0-rc.1 of vsphere-csi-driver. We should ship OCp with a GA version

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-17-161027

https://github.com/openshift/vmware-vsphere-csi-driver/pull/64

Bug OCPBUGS-11636: AWS s3 policy changes block all OCP installs on AWS

View the Description View the linked PRs

Description of problem:

The ACLs are disabled for all newly created s3 buckets, this causes all OCP installs to fail: the bootstrap ignition can not be uploaded:

level=info msg=Creating infrastructure resources...
level=error
level=error msg=Error: error creating S3 bucket ACL for yunjiang-acl413-4dnhx-bootstrap: AccessControlListNotSupported: The bucket does not allow ACLs
level=error msg=	status code: 400, request id: HTB2HSH6XDG0Q3ZA, host id: V6CrEgbc6eyfJkUbLXLxuK4/0IC5hWCVKEc1RVonSbGpKAP1RWB8gcl5dfyKjbrLctVlY5MG2E4=
level=error
level=error msg=  with aws_s3_bucket_acl.ignition,
level=error msg=  on main.tf line 62, in resource "aws_s3_bucket_acl" "ignition":
level=error msg=  62: resource "aws_s3_bucket_acl" ignition {
level=error
level=error msg=failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "bootstrap" stage: failed to create cluster: failed to apply Terraform: exit status 1
level=error
level=error msg=Error: error creating S3 bucket ACL for yunjiang-acl413-4dnhx-bootstrap: AccessControlListNotSupported: The bucket does not allow ACLs
level=error msg=	status code: 400, request id: HTB2HSH6XDG0Q3ZA, host id: V6CrEgbc6eyfJkUbLXLxuK4/0IC5hWCVKEc1RVonSbGpKAP1RWB8gcl5dfyKjbrLctVlY5MG2E4=
level=error
level=error msg=  with aws_s3_bucket_acl.ignition,
level=error msg=  on main.tf line 62, in resource "aws_s3_bucket_acl" "ignition":
level=error msg=  62: resource "aws_s3_bucket_acl" ignition {

Version-Release number of selected component (if applicable):

4.11+

How reproducible:

Always

Steps to Reproduce:

1.Create a cluster via IPI

Actual results:

install fail

Expected results:

install succeed

Additional info:

Heads-Up: Amazon S3 Security Changes Are Coming in April of 2023 - https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/

https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-ownership-error-responses.html - After you apply the bucket owner enforced setting for Object Ownership, ACLs are disabled.

https://github.com/openshift/installer/pull/7081

Bug OCPBUGS-7841: hypershift_hostedclusters_failure_conditions metric incorrectly reports multiple clusters for a single cluster

View the Description View the linked PRs

Description of problem:

The hypershift_hostedclusters_failure_conditions metric produced by the HyperShift operator does not report a value of 0 for conditions that no longer apply. The result is that if a hostedcluster had a failure condition at a given point, but that condition has gone away, the metric still reports a count for that condition.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a HostedCluster, watch the hypershift_hostedclusters_failure_conditions metric as failure conditions occur.
2.
3.

Actual results:

A cluster count of 1 with a failure condition is reported even if the failure condition no longer applies.

Expected results:

Once failure conditions no longer apply, 0 clusters with those conditions should be reported.

Additional info:

The metric should report an accurate count for each possible failure condition of all clusters at any given time.

Bug HOSTEDCP-1073: CPO does not respect CVO runlevels

View the Description View the linked PRs

The CPO does not currently respect the CVO runlevels as standalone OCP does.

The CPO reconciles everything all at once during upgrades which is resulting in FeatureSet aware components trying to start because the FeatureSet status is set for that version, leading to pod restarts.

It should roll things out in the following order for both initial install and upgrade, waiting between stages until rollout is complete:

etcd
kas
kcm and ks
everything else

https://github.com/openshift/hypershift/pull/2726

Task HOSTEDCP-1101: Add snyk-secret to push/pull Tekton Files

View the Description View the linked PRs

Add snyk-secret to parameters to the push & pull tekton files so that snyk scan will be performed on HO RHTAP builds.

https://github.com/openshift/hypershift/pull/2788

Bug OCPBUGS-10844: Keystore in secret corrupted after editing the secret in the Console

View the Description View the linked PRs

Description of problem:

When modifying a secret in the Management Console that has a binary file inclued (such as a keystore), the keystore will get corrupted post the modification and therefore impact application functionality (as the keystore can not be read).

$ openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365
$ cat cert.pem key.pem > file.crt.txt
$ openssl pkcs12 -export -in file.crt.txt -out mykeystore.pkcs12 -name myAlias -noiter -nomaciter
$ oc create secret generic keystore --from-file=mykeystore.pkcs12 --from-file=cert.pem --from-file=key.pem -n project-300

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: project-300
spec:
  containers:
  - name: mypod
    image: quay.io/rhn_support_sreber/curl:latest
    volumeMounts:
    - name: foo
      mountPath: "/keystore"
      readOnly: true
  volumes:
  - name: foo
    secret:
      secretName: keystore
      optional: true

# Getting the md5sum from the file on the local Laptop to compare with what is available in the pod
$ md5sum mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  mykeystore.pkcs12

sh-5.2# ls -al /keystore/..data/
total 16
drwxr-xr-x. 2 root root  100 Mar 24 11:19 .
drwxrwxrwt. 3 root root  140 Mar 24 11:19 ..
-rw-r--r--. 1 root root 1992 Mar 24 11:19 cert.pem
-rw-r--r--. 1 root root 3414 Mar 24 11:19 key.pem
-rw-r--r--. 1 root root 4380 Mar 24 11:19 mykeystore.pkcs12

sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  /keystore/..data/mykeystore.pkcs12
sh-5.2#

Edit cert.pem in secret using the Management Console

$ oc delete pod mypod -n project-300

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  namespace: project-300
spec:
  containers:
  - name: mypod
    image: quay.io/rhn_support_sreber/curl:latest
    volumeMounts:
    - name: foo
      mountPath: "/keystore"
      readOnly: true
  volumes:
  - name: foo
    secret:
      secretName: keystore
      optional: true

sh-5.2# ls -al /keystore/..data/
total 20
drwxr-xr-x. 2 root root   100 Mar 24 12:52 .
drwxrwxrwt. 3 root root   140 Mar 24 12:52 ..
-rw-r--r--. 1 root root  1992 Mar 24 12:52 cert.pem
-rw-r--r--. 1 root root  3414 Mar 24 12:52 key.pem
-rw-r--r--. 1 root root 10782 Mar 24 12:52 mykeystore.pkcs12

sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
56f04fa8059471896ed5a3c54ade707c  /keystore/..data/mykeystore.pkcs12
sh-5.2#      

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-03-23-204038   True        False         91m     Cluster version is 4.13.0-0.nightly-2023-03-23-204038

The modification was done in the Management Console, selecting the secret and then use: Actions -> Edit Secrets -> Modifying the value of cert.pem and submiting via Save button

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.0-0.nightly-2023-03-23-204038 and 4.12.6

How reproducible:

Always

Steps to Reproduce:

1. See above the details steps

Actual results:

# md5sum on the Laptop for the file
$ md5sum mykeystore.pkcs12
c189536854e59ab444720efaaa76a34a  mykeystore.pkcs12

# md5sum of the file in the pod after the modification in the Management Console
sh-5.2# md5sum /keystore/..data/mykeystore.pkcs12
56f04fa8059471896ed5a3c54ade707c  /keystore/..data/mykeystore.pkcs12

The file got corrupted and is not usable anymore. The binary file though should not be modified if no changes was made on it's value, when editing the secret in the Mansgement Console.

Expected results:

The binary file though should not be modified if no changes was made on it's value, when editing the secret in the Mansgement Console.

Additional info:

A similar problem was alredy fixed in https://bugzilla.redhat.com/show_bug.cgi?id=1879638 but that was, when the binary file was uploaded. Possible that the secret edit functionality is also missing binary file support.

https://github.com/openshift/console/pull/12986

Bug OCPBUGS-12576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/151

Bug OCPBUGS-14396: Ingress Operator E2E log.SetLogger was never called error message

View the Description View the linked PRs

Description of problem:

cluster-ingress-operator E2E has an error message:

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:

Looks like newClient is called from two places, TestMain and TestIngressStatus

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run E2E tests that call newClient, such as TestIngressStatus
2. Examine logs

Actual results:

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/924/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1663696029016395776/build-log.txt

[controller-runtime] log.SetLogger(...) was never called, logs will not be displayed:
goroutine 9120 [running]:
runtime/debug.Stack()
	/usr/lib/golang/src/runtime/debug/stack.go:24 +0x65
sigs.k8s.io/controller-runtime/pkg/log.eventuallyFulfillRoot()
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/log.go:59 +0xbd
sigs.k8s.io/controller-runtime/pkg/log.(*delegatingLogSink).WithName(0xc000113000, {0x1dd106b, 0x14})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/log/deleg.go:147 +0x4c
github.com/go-logr/logr.Logger.WithName({{0x21435e0, 0xc000113000}, 0x0}, {0x1dd106b?, 0xe?})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/github.com/go-logr/logr/logr.go:336 +0x46
sigs.k8s.io/controller-runtime/pkg/client.newClient(0xc00086afc0, {0x0, 0xc0001a0fc0, {0x2144930, 0xc00033ac00}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:115 +0xb4
sigs.k8s.io/controller-runtime/pkg/client.New(0xc00086afc0?, {0x0, 0xc0001a0fc0, {0x2144930, 0xc00033ac00}, 0x0, {0x0, 0x0}, 0x0})
	/go/src/github.com/openshift/cluster-ingress-operator/vendor/sigs.k8s.io/controller-runtime/pkg/client/client.go:101 +0x85
github.com/openshift/cluster-ingress-operator/pkg/operator/client.NewClient(0x0?)
	/go/src/github.com/openshift/cluster-ingress-operator/pkg/operator/client/client.go:83 +0x145
github.com/openshift/cluster-ingress-operator/test/e2e.TestIngressStatus(0xc000503520)
	/go/src/github.com/openshift/cluster-ingress-operator/test/e2e/dns_ingressdegrade_test.go:33 +0x95
testing.tRunner(0xc000503520, 0x1f015a0)
	/usr/lib/golang/src/testing/testing.go:1576 +0x10b
created by testing.(*T).Run
	/usr/lib/golang/src/testing/testing.go:1629 +0x3ea

Expected results:

No error message

Additional info:

This is due to 1.27 rebase

https://github.com/openshift/cluster-ingress-operator/pull/946

Bug OCPBUGS-11389: CPMS doesn't always generate configurations for AWS

View the Description View the linked PRs

Description of problem:

In certain cases, an AWS cluster running 4.12 doesn't automatically generate a controlplanemachineset when it's expected to.

It looks like CPMS is looking for `infrastructure.Spec.PlatformSpec.Type` (https://github.com/openshift/cluster-control-plane-machine-set-operator/blob/2aeaaf9ec714ee75f933051c21a44f648d6ed42b/pkg/controllers/controlplanemachinesetgenerator/controller.go#L180) and as result, clusters born earlier than 4.5 when this field was introduced (https://github.com/openshift/installer/pull/3277) will not be able to generate a CPMS.

I believe we should be looking at `infrastructure.Status.PlatformStatus.Type` instead

Version-Release number of selected component (if applicable):

4.12.9

How reproducible:

Consistent

Steps to Reproduce:

1. Install a cluster on a version earlier than 4.5
2. Upgrade cluster through to 4.12
3. Observe "Unable to generate control plane machine set, unsupported platform" error message from the control-plane-machine-set-operator, as well as the missing CPMS object in the openshift-machine-api namespace

Actual results:

No generated CPMS is created, despite the platform being AWS

Expected results:

A generated CPMS existing in the openshift-machine-api namespace

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/185

Bug OCPBUGS-16334: CR.status.conditions and CR.status.LastSyncTimestamp are not updated in some code branches

View the Description View the linked PRs

Description of problem:

1. CR.status.LastSyncTimestamp should also be updated in the "else" code branch: 
https://github.com/openshift/cloud-credential-operator/blob/4cb9faca62c31ebea9a11b55f7af764be4ee2cd8/pkg/operator/credentialsrequest/credentialsrequest_controller.go#L1054

2. r.Client.Status().Update is not called on the CR object in memory after this line:
https://github.com/openshift/cloud-credential-operator/blob/4cb9faca62c31ebea9a11b55f7af764be4ee2cd8/pkg/operator/credentialsrequest/credentialsrequest_controller.go#L713
So CR.status.conditions are not updated.

Steps to Reproduce:

This results from a static code check.

https://github.com/openshift/cloud-credential-operator/pull/568

Bug OCPBUGS-19657: The KUBELET_NODE_IPS does not reflect in the kubelet service after the dual-stack conversion

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-15910~~. The following is the description of the original issue:
—

$ oc get mc 01-master-kubelet -o json | jq -r '.spec.config.systemd.units | .[] | select(.name=="kubelet.service") | .contents'
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service network-online.target
Requires=crio.service kubelet-auto-node-size.service
After=network-online.target crio.service kubelet-auto-node-size.service
After=ostree-finalize-staged.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
ExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env
EnvironmentFile=/etc/node-sizing.env

ExecStart=/usr/local/bin/kubenswrapper \
    /usr/bin/kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --runtime-cgroups=/system.slice/crio.service \
      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --node-ip=${KUBELET_NODE_IP} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --hostname-override=${KUBELET_NODE_NAME} \
      --provider-id=${KUBELET_PROVIDERID} \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4c0a1b82501a416df4b926801bc3aa378d2762d0570a0791c6675db1a3365c62 \
      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \
      --v=${KUBELET_LOG_LEVEL}

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

https://github.com/openshift/machine-config-operator/blob/29b3729923273ae7f42cd20e096fa1a390d4b108/templates/master/01-master-kubelet/_base/units/kubelet.service.yaml#L33

https://github.com/openshift/machine-config-operator/pull/3934

Bug OCPBUGS-7777: Azure OpenShiftSDN drop-icmp container uses deprecated oc observe cli arg

View the Description View the linked PRs

Description of problem:

2023-02-20T16:27:58.107800612Z + oc observe pods -n openshift-sdn --listen-addr= -l app=sdn -a '{ .status.hostIP }' -- /var/run/add_iptables.sh
2023-02-20T16:27:58.181727766Z Flag --argument has been deprecated, and will be removed in a future release. Use --template instead.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-17-090603

How reproducible:

Always

Steps to Reproduce:

1. Deploy Azure OpenShiftSDN cluster
2. Check drop-icmp container logs
oc logs -n openshift-sdn -c drop-icmp -l app=sdn --previous
3.

Actual results:

+ true
+ iptables -F AZURE_ICMP_ACTION
+ iptables -A AZURE_ICMP_ACTION -j LOG
+ iptables -A AZURE_ICMP_ACTION -j DROP
+ oc observe pods -n openshift-sdn --listen-addr= -l app=sdn -a '{ .status.hostIP }' -- /var/run/add_iptables.sh
Flag --argument has been deprecated, and will be removed in a future release. Use --template instead.
E0220 16:27:07.553592   27842 memcache.go:238] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: connect: connection refused
E0220 16:27:07.553913   27842 memcache.go:238] couldn't get current server API group list: Get "https://172.30.0.1:443/api?timeout=32s": dial tcp 172.30.0.1:443: connect: connection refused
The connection to the server 172.30.0.1:443 was refused - did you specify the right host or port?
Error from server (BadRequest): previous terminated container "drop-icmp" in pod "sdn-v7gqq" not found

Expected results:

No deprecation warning

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1760

Story STOR-1432: Allow separate images to the specified for Hosted Control Plane components

View the Description View the linked PRs

Hypershift needs to be able to specify a different release payload for control plane components without redeploying anything in the hosted cluster.

csi-driver-node DaemonSet pods in the hosted cluster and the csi-driver-controller Deployment that runs in the control plane both use the AWS_EBS_DRIVER_IMAGE and LIVENESS_PROBE_IMAGE

https://github.com/openshift/hypershift/blob/fc42313fc93125799f7eba5361190043cc2f6561/control-plane-operator/controllers/hostedcontrolplane/storage/envreplace.go#L9-L48

We need a way to specify these images separately for csi-driver-node and csi-driver-controller.

Bug OCPBUGS-10125: Update 4.14 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/33

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/34

Bug OCPBUGS-12327: Update 4.14 ose-network-interface-bond-cni image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/bond-cni/pull/52

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/bond-cni/pull/52

Bug OCPBUGS-26044: Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25943~~. The following is the description of the original issue:
—
Description of problem:

Adding test case when exceed openshift.io/image-tags will ban to create new image references in the project

Version-Release number of selected component (if applicable):

    4.16

pr - https://github.com/openshift/origin/pull/28464

https://github.com/openshift/origin/pull/28493

Bug OCPBUGS-19884: [4.14] Install does not begin if secure boot was enabled for the first time

View the Description View the linked PRs

Description of problem:
If secure boot is currently disabled, and user attempts to enable it via ZTP, install will not begin the first time ZTP was triggered.

When secure boot is enabled viz ZTP, then boot options will be configured before virtual CD was attached, thus first boot will be booting into existing HD with secure boot on. Install will then get stuck because boot from CD was never triggered.

Version-Release number of selected component (if applicable):
4.10

How reproducible:
Always

Steps to Reproduce:
1. Secure boot is currently disabled in bios
2. Attempt to deploy a cluster with secure boot enabled via ZTP
3.

Actual results:

spoke cluster got booted with secure boot option toggled, into existing HD
spoke cluster did not boot into virtual CD, thus install never started.
agentclusterinstall gets stuck here:
State: insufficient
State Info: Cluster is not ready for install

Expected results:

installation started and completed successfully

Additional info:

Secure boot config used in ZTP siteconfig:
http://registry.kni-qe-0.lab.eng.rdu2.redhat.com:3000/kni-qe/ztp-site-configs/src/ff814164cdcd355ed980f1edf269dbc2afbe09aa/siteconfig/master-2.yaml#L40

Bug MGMT-15295: [STG] Custom Manifest - after starting to install cluster with custom manifest , manifest is removed and not listable via http request

View the Description View the linked PRs

Description of the problem:
Please see Screening
Once installation started of cluster with valid custom manifest , manifest is no longer listable not mentioned in UI neither in cluster logs also when calling api/assisted-install/v2/clusters/{}/manifests
before installation manifest is listed , however after installation starts http api return error

{ "code": "500", "href": "", "id": 500, "kind": "Error", "reason": "Cannot list file 3a46c77e-bafc-4b66-87c8-80fe4e18806c/manifests/openshift/50-masters-chrony-configuration.yaml in cluster 3a46c77e-bafc-4b66-87c8-80fe4e18806c" }

How reproducible:
100%

Steps to reproduce:

1. created cluster with custom manifest

2. was able to see manifest in cluster details in installation page (before installation started)

3.also able to retrieve it via http get request

4. started installation
Actual results:
custom manifest no longer visible and not mentioned in logs
http get request returning above mentioned error (500)
it seems custom manifest was not added

Expected results:
manifest should still be visible and applied

https://github.com/openshift/assisted-service/pull/5366

Bug MGMT-15423: [Minikube] [BE] - change the user message when the host is not compatible with cluster platform

View the Description View the linked PRs

Description of the problem:

Change the user message from: "Host is not compatible with cluster platform %s; either disable this host or choose a compatible cluster platform (%v)" to "Host is not compatible with cluster platform %s; either disable this host or discover a new, compatible host."

How reproducible:

100%

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5412

Bug OCPBUGS-10169: Update 4.14 telemeter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/telemeter/pull/452

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/telemeter/pull/452

Bug OCPBUGS-10926: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/513

Bug OCPBUGS-14368: [4.14][Azure] Replace master failed as new master did not add into lb backend

View the Description View the linked PRs

A clone of https://issues.redhat.com/browse/OCPBUGS-11143 but for the downstream openshift/cloud-provider-azure

Description of problem:

On azure, delete a master, old machine stuck in Deleting, some pods in cluster are in ImagePullBackOff, check from azure console, new master did not add into lb backend, seems this lead the machine has no internet connection.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-12-024338

How reproducible:

Always

Steps to Reproduce:

1. Set up a cluster on Azure, networkType ovn
2. Delete a master
3. Check master and pod

Actual results:

Old machine stuck in Deleting,  some pods are in ImagePullBackOff.
 $ oc get machine    
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsunaz2132-5ctmh-master-0              Deleting   Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-1              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-2              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-flqqr-0        Running    Standard_D8s_v3   westus          105m
zhsunaz2132-5ctmh-worker-westus-dhwfz   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-dw895   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-xlsgm   Running    Standard_D4s_v3   westus          152m

$ oc describe machine zhsunaz2132-5ctmh-master-flqqr-0  -n openshift-machine-api |grep -i "Load Balancer"
      Internal Load Balancer:  zhsunaz2132-5ctmh-internal
      Public Load Balancer:      zhsunaz2132-5ctmh

$ oc get node            
NAME                                    STATUS     ROLES                  AGE    VERSION
zhsunaz2132-5ctmh-master-0              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-1              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-2              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-flqqr-0        NotReady   control-plane,master   109m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dhwfz   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dw895   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-xlsgm   Ready      worker                 152m   v1.26.0+149fe52
$ oc describe node zhsunaz2132-5ctmh-master-flqqr-0
  Warning  ErrorReconcilingNode       3m5s (x181 over 108m)  controlplane         [k8s.ovn.org/node-chassis-id annotation not found for node zhsunaz2132-5ctmh-master-flqqr-0, macAddress annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0" , k8s.ovn.org/l3-gateway-config annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0"]

$ oc get po --all-namespaces | grep ImagePullBackOf   
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-l8ng4                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-csi-drivers                      azure-file-csi-driver-node-99k82                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-node-tuning-operator             tuned-bvvh7                                                       0/1     ImagePullBackOff        0              113m
openshift-dns                                      node-resolver-2p4zq                                               0/1     ImagePullBackOff        0              113m
openshift-image-registry                           node-ca-vxv87                                                     0/1     ImagePullBackOff        0              113m
openshift-machine-config-operator                  machine-config-daemon-crt5w                                       1/2     ImagePullBackOff        0              113m
openshift-monitoring                               node-exporter-mmjsm                                               0/2     Init:ImagePullBackOff   0              113m
openshift-multus                                   multus-4cg87                                                      0/1     ImagePullBackOff        0              113m
openshift-multus                                   multus-additional-cni-plugins-mc6vx                               0/1     Init:ImagePullBackOff   0              113m
openshift-ovn-kubernetes                           ovnkube-master-qjjsv                                              0/6     ImagePullBackOff        0              113m
openshift-ovn-kubernetes                           ovnkube-node-k8w6j                                                0/6     ImagePullBackOff        0              113m

Expected results:

Replace master successful

Additional info:

Tested payload 4.13.0-0.nightly-2023-02-03-145213, same result.
Before we have tested in 4.13.0-0.nightly-2023-01-27-165107, all works well.

Bug OCPBUGS-14419: Remove Tech Preview badge from the PAC List and details page

View the Description View the linked PRs

Description of problem:

Pipeline as a code has been GA for some time. So, we should remove the Tech preview badge from the PAC pages.

Version-Release number of selected component (if applicable):

4.13

https://github.com/openshift/console/pull/12888

Bug OCPBUGS-17472: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/389

Bug OCPBUGS-20697: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-node-driver-registrar/pull/52

Bug OCPBUGS-21477: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/167

Bug OCPBUGS-15256: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-15291: DeploymentConfig deprecation warning breaks oc idle tests

View the Description View the linked PRs

Description of problem:

oc idle tests do not expect the deprecation warning in its output and breaks.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run the test
2. Watch it fail
3.

Actual results:

Error running /usr/bin/oc --namespace=e2e-test-oc-idle-hns4c --kubeconfig=/tmp/configfile3347652119 describe deploymentconfigs v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io:
StdOut>
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
Error from server (NotFound): deploymentconfigs.apps.openshift.io "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io" not found
StdErr>
Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
Error from server (NotFound): deploymentconfigs.apps.openshift.io "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
deploymentconfig.apps.openshift.io" not found
exit status 1

Expected results:

Tests should pass

Additional info:

I have tracked down the problem to this line: https://github.com/openshift/origin/blob/master/test/extended/cli/idle.go#LL49C40-L49C40

deploymentConfigName gets assigned to "v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ deploymentconfig.apps.openshift.io", which leads to the next command not finding a deployment config.

Bug OCPBUGS-15458: Links for console-dynamic-plugin-sdk markdown docs are not working

View the Description View the linked PRs

Description of problem:

Links for both markdown documents in console-dynamic-plugin-sdk/docs are not working.
Check https://github.com/openshift/console/blob/master/frontend/packages/console-dynamic-plugin-sdk/docs

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Clicking on a link in any markdown doc is not taking user to the appropriate section.

Expected results:

Clicking on a link in any markdown doc should take user to the appropriate section.

Additional info:

Bug OCPBUGS-17142: [4.14] update packages in ironic containers

View the Description View the linked PRs

We'll do another pass of updates in the ironic containers

https://github.com/openshift/ironic-image/pull/387

Bug OCPBUGS-17415: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5417

Bug ACM-4277: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2470

Bug OCPBUGS-10961: Fix description for BuildAdapter SDK extension

View the Description View the linked PRs

Description of problem:

The description for the BuildAdapter SDK extension is wrong.

Actual results:

BuildAdapter contributes an adapter to adapt element to data that can be used by Pod component

Expected results:

BuildAdapter contributes an adapter to adapt element to data that can be used by Build component

Additional info:

https://github.com/openshift/console/pull/12683

Bug OCPBUGS-15994: When config-image is used console password is not accepted

View the Description View the linked PRs

Description of problem:

When the configuration is installed with the config-image,
the kubeadmin-password it not accepted to log into the console.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Build and install unconfigured ignition
2. Build and install config-image
3. When able to ssh into host0, attempt to log into console using the core user and generated kubeadmin-password.

Actual results:

The login fails.

Expected results:

The login should succeed.

Additional info:

https://github.com/openshift/installer/pull/7338

Bug OCPBUGS-18415: Update 4.14 ose-etcd image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/etcd/pull/208

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/etcd/pull/208

Bug OCPBUGS-21065: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3994

Bug OCPBUGS-9991: Most of contents are lack of i18n on "Command Line Tools" page

View the Description View the linked PRs

Description of problem:

Most contents on "Command Line Tools" page are not i18n.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-10-165006

How reproducible:

Always

Steps to Reproduce:

1.Go to "?"-> "Command Line Tools" page. Add "?pseudolocalization=true&lng=en" at the end of the url. Check if all contents are i18n.
2.
3.

Actual results:

1. Most of contents are not i18n.

Expected results:

1.All contents should be i18n.

Additional info:

https://github.com/openshift/console/pull/12995

Bug OCPBUGS-17316: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/571

Bug OCPBUGS-19865: Azure AD Workload Identity does not work with bring your own vnet

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18246~~. The following is the description of the original issue:
—
Description of problem:

Role assignment for Azure AD Workload Identity performed by ccoctl does not provide an option to scope role assignments to a resource group containing customer vnet in a byo vnet installation workflow.

https://docs.openshift.com/container-platform/4.13/installing/installing_azure/installing-azure-vnet.html

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure resource group and vnet for OpenShift within that resource group.
2. Create Azure AD Workload Identity infrastructure with ccoctl.
3. Follow steps to configure existing vnet for installation setting networkResourceGroupName within the install config.
4. Attempt cluster installation.

Actual results:

Cluster installation fails.

Expected results:

Cluster installation succeeds.

Additional info:

ccoctl must be extended to accept a parameter specifying the network resource group name and scope relevant component role assignments to the network resource group in addition to the installation resource group.

https://github.com/openshift/cloud-credential-operator/pull/602

Bug MGMT-13946: Ignore motherboard serial for Proliant Gen 11

View the Description View the linked PRs

Description of the problem:

Proliant Gen 11 always reports the serial number "PCA_number.ACC", causing all hosts to register with the same UUID.

How reproducible:

100%

Steps to reproduce:

1. Boot two Proliant Gen 11 hosts

2. See that both hosts are updating a single host entry in the service

Actual results:

All hosts with this hardware are assigned the same UUID

Expected results:

Each host should have a unique UUID

https://github.com/openshift/assisted-installer-agent/pull/522

Bug OCPBUGS-19401: [4.14] cgroupv2 memory calculation is not accounted correctly

View the Description View the linked PRs

Description of problem:

https://github.com/kubernetes/kubernetes/issues/118916

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. compare memory usage from v1 and v2 and notice differences with the same workloads
2.
3.

Actual results:

they slightly differ because of accounting differences

Expected results:

they should be largely the same

Additional info:

https://github.com/openshift/kubernetes/pull/1713

Bug OCPBUGS-19738: Remove warning about CPUPartitioning

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19699~~. The following is the description of the original issue:
—
Description of problem:


When CPUPartitioning is not set in install-config.yaml a warning message is still generated

WARNING CPUPartitioning:  is ignored

This warning is both incorrect, since the check is against "None" and the the value is an empty string when not set, and also no longer relevant now that https://issues.redhat.com//browse/OCPBUGS-18876 has been fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create an install config with CPUPartitioning not set
2. Run "openshift-install agent create image --dir cluster-manifests/ --log-level debug"

Actual results:

See the output "WARNING CPUPartitioning:  is ignored"

Expected results:

No warning

Additional info:

https://github.com/openshift/installer/pull/7529

Task MON-2641: Write e2e tests for the alertingrules CRD

View the Description View the linked PRs

OCP 4.11 ships the alertingrules CRD as a techpreview feature. Before graduating to GA we need to have e2e tests in the CMO repository.

AC:

End-to-end tests in the CMO repository validating that
- Admins can create/update/delete alertingrules
- ~~Invalid resources are rejected~~ invalid alertingrules don't break the system
Configuration of a blocking job in openshift/release.

https://github.com/openshift/cluster-monitoring-operator/pull/2054

Bug OCPBUGS-14984: Must Gather does not include BF-2 firmware information

View the Description View the linked PRs

Description of problem:

The must gather should contain additional debug information such as the current configuration and firmware settings of any Bluefields / Mellanox device when using SRIOV

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/must-gather/pull/365

Bug OCPBUGS-15575: updated nmstate builds will not work for MCO

View the Description View the linked PRs

Description of problem:

nmstate packages > 2.2.9 will cause MCD firstboot to fail. For now, let's pin the nmstate version and fix properly via https://github.com/openshift/machine-config-operator/pull/3720

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3769

Bug OCPBUGS-26461: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/156

Bug MGMT-15070: [Stage-UI] Unable to change machine-network when multiple interfaces

View the Description View the linked PRs

Description of the problem:

When installing a cluster and we have multiple networks, we can not change the machine network from UI ( its not changed to the new machine network) but when installing it shows the chosen network.

from customer view :

he choose machine network , its in the list but never shown as chosen but actually it appears when installing.

How reproducible:

Always

Steps to reproduce:

Install cluster , mutiple networks

Try to change machine network -> does not work

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5349

Task MGMT-15235: Change assisted related repos to compile with CGO_ENABLED=1

View the Description View the linked PRs

Needed for FIPS compliance

Bug OCPBUGS-14926: [4.14] No suitable virtual media device found for Cisco UCS Blade

View the Description View the linked PRs

Description of problem:

Not able to provision a new baremetalhost because ironic is not able to find a suitable virtual media device.

Version-Release number of selected component (if applicable):

How reproducible:

100% if you have a UCS Blade

Steps to Reproduce:

1. add the baremetalhost 
2. wait for the error
3.

Actual results:

No suitable virtual media device found.

Expected results:

That the provisioning would succeeed

Additional info:

I tried to insert an ISO using curl and I can do it on the virtualmedia[3] device, which is a virtual DVD.

When I'm looking at the metal3-ironic logs I can see the follow entry:
Received representation of VirtualMedia /redfish/v1/Managers/CIMC/VirtualMedia/3: {'_actions': {'eject_media': {'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Managers/CIMC/VirtualMedia/3/Actions/VirtualMedia.EjectMedia'}, 'insert_media': {'operation_apply_time_support': None, 'target_uri': '/redfish/v1/Managers/CIMC/VirtualMedia/3/Actions/VirtualMedia.InsertMedia'}}, '_certificates_path': None, '_oem_vendors': ['Cisco'], 'connected_via': <ConnectedVia.URI: 'URI'>, 'identity': '3', 'image': None, 'image_name': None, 'inserted': False, 'links': None, 'media_types': [<VirtualMediaType.DVD: 'DVD'>], 'name': 'CIMC-Mapped vDVD', 'status': {'health': <Health.OK: 'OK'>, 'health_rollup': None, 'state': <State.DISABLED: 'Disabled'>}, 'transfer_method': None, 'user_name': None, 'verify_certificate': None, 'write_protected': False}

I'm sure this is the correct device, and verified that I can insert vmedia using curl.

Someone metal3/ironic is not selecting this device.
I'm suspecting that the reason is that "DVD" is not a valid media_type.
When I look at [the ironic code](https://github.com/openstack/ironic/blob/b4f8209b99af32d8d2a646591af9b62436aad3d8/ironic/drivers/modules/redfish/boot.py#LL188C31-L188C31) I can see that there is a check for the media_type.

I'm not able to see which values are accepted by metal3.

I was able to validate the media_types for a rackmount server which works and there I see the following values: "CD, DVD".

This led me to believe that DVD is not an accepted value.

Can you please confirm that this is the case and if so, can we add the DVD as a suitable device?

Bug OCPBUGS-18719: vsphere IPI: missing guestinfo.domain in bootstrap VM

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18304~~. The following is the description of the original issue:
—
Description of problem:

https://github.com/openshift/installer/pull/6770 reverted part of https://github.com/openshift/installer/pull/5788 which has set guestinfo.domain for bootstrap machine. This breaks some OKD installations, which require that setting

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7477

Bug OCPBUGS-20439: [4.14] builds.config.openshift.io CRD is available in a cluster with baselineCapabilitySet None

View the Description View the linked PRs

Description of problem:

a cluster installed with baselineCapabilitySet: None have build available while the build capability is disabled


❯ oc get -o json clusterversion version | jq '.spec.capabilities'                      
{
  "baselineCapabilitySet": "None"
}

❯ oc get -o json clusterversion version | jq '.status.capabilities.enabledCapabilities'
null

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

100%

Steps to Reproduce:

1.install a cluster with baselineCapabilitySet: None

Actual results:

❯ oc get build -A                   
NAME      AGE
cluster   5h23m

Expected results:

❯ oc get -A build
error: the server doesn't have a resource type "build"

slack thread with more info: https://redhat-internal.slack.com/archives/CF8SMALS1/p1696527133380269

Bug OCPBUGS-11869: Pod Status Overlapping in Sidebar

View the Description View the linked PRs

Description of problem:

Pod Status Overlapping in Sidebar
Status that is breaking the UI: CreateContainerConfigError

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always when the status is CreateContainerConfigError

Steps to Reproduce:

1. Create a Pod that gives CreateContainerConfigError

Sample YAML:

apiVersion: v1
kind: Pod
metadata:
  name: example
  labels:
    app: httpd
  namespace: avik
spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: httpd
      image: docker.io/httpd:latest
      ports:
        - containerPort: 80
      securityContext:
        allowPrivilegeEscalation: true
        capabilities:
          drop:
            - ALL

Actual results:

The Pod Status should not overlapping when the status is long.

Expected results:

The Pod Status should not overlap. Also, this error status should look like the other error statuses.

Additional info:

https://github.com/openshift/console/pull/12732

Bug OCPBUGS-17950: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/555

Bug OCPBUGS-3393: oc image mirror to file system is not working as expected when HTTP 429 error is received from registry with empty response

View the Description View the linked PRs

Description of problem:

While mirror to filesystem, if 429 error is received from registry, the layer is incorrectly flagged as having been mirrored & therefore not picked up by subsequent mirror re-run requests. It gives the impression as mirror to file system in second attempt is successful. However, causing issue while mirroring from filesystem to target registry (Due to missing files)

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.8.42
Server Version: 4.8.14
Kubernetes Version: v1.21.1+a620f50

How reproducible:

When 429 occurs while mirror to file system

Steps to Reproduce:

1. Run mirror to filesystem command : oc image mirror -f mirror-to-filesystem.txt --filter-by-os '.*' -a $REGISTRY_AUTH_FILE --insecure --skip-multiple-scopes --max-per-registry=1 --continue-on-error=true --dir "$LOCAL_DIR_PATH"  

Output: 
info: Mirroring completed in 2h19m24.14s (25.75MB/s)
error: one or more errors occurred 
E.g
error: unable to push <registry>/namespace/<image-name>: failed to retrieve blob <image-digest>: error parsing HTTP 429 response body: unexpected end of JSON input: ""


2. Re Run mirror to filesystem command : oc image mirror -f mirror-to-filesystem.txt --filter-by-os '.*' -a $REGISTRY_AUTH_FILE --insecure --skip-multiple-scopes --max-per-registry=1 --continue-on-error=true --dir "$LOCAL_DIR_PATH"

Output:
info: Mirroring completed in 480ms (0B/s)


3. Run mirror from filesystem command : oc image mirror -f mirror-from-filesystem.txt -a $REGISTRY_AUTH_FILE --from-dir "$LOCAL_DIR_PATH" --filter-by-os '.*' --insecure --skip-multiple-scopes --max-per-registry=1 --continue-on-error=true

Output: 
info: Mirroring completed in 53m5.21s (67.61MB/s)
error: one or more errors occurred
E.g
error: unable to push file://local/namespace/<image-name>: failed to retrieve blob <image-digest>: open /root/local/namespace/<image-name>/blobs/<image-digest>: no such file or directory

Actual results:

1) mirror to filesystem first attempt: 

info: Mirroring completed in 2h19m24.14s (25.75MB/s) 
error: one or more errors occurred 
E.g
error: unable to push <registry>/namespace/<image-name>: failed to retrieve blob <image-digest>: error parsing HTTP 429 response body: unexpected end of JSON input: ""

2) mirror to filesystem second attempt: 

info: Mirroring completed in 480ms (0B/s)

 
3) mirror from filesystem to target registry:  

info: Mirroring completed in 53m5.21s (67.61MB/s) 
error: one or more errors occurred 
E.g 
error: unable to push file://local/namespace/<image-name>: failed to retrieve blob <image-digest>: open /root/local/namespace/<image-name>/blobs/<image-digest>: no such file or directory

Expected results:

source image mirror -> to file system and image mirror from file system -> target registry should complete successfully

Additional info:

https://github.com/openshift/oc/pull/1355

Bug OCPBUGS-10988: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vsphere-problem-detector/pull/100

Bug OCPBUGS-13526: Dynamic conversion webhook clientConfig not retained as operator installs

View the Description View the linked PRs

Description of problem:

During a fresh install of an operator with conversion webhooks enabled, `crd.spec.conversion.webhook.clientConfig` is dynamically updated initially, as expected, with the proper webhook ns, name, & caBundle. However, within a few seconds, those critical settings are overwritten with the bundle’s packaged CRD conversion settings. This breaks the operator and stops the installation from completing successfully.

Oddly though, if that same operator version is installed as part of an upgrade from a prior release... the dynamic clientConfig settings are retained and all works as expected.

Version-Release number of selected component (if applicable):

OCP 4.10.36
OCP 4.11.18

How reproducible:

Consistently

Steps to Reproduce:

1. oc apply -f https://gist.githubusercontent.com/tchughesiv/0951d40f58f2f49306cc4061887e8860/raw/3c7979b58705ab3a9e008b45a4ed4abc3ef21c2b/conversionIssuesFreshInstall.yaml
2. oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

Actual results:

Eventually, the clientConfig settings will revert to the following and stay that way.

$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}'
map[service:map[name:dbaas-operator-webhook-service namespace:openshift-dbaas-operator path:/convert port:443]]

 conversion:
   strategy: Webhook
   webhook:
     clientConfig:
       service:
         namespace: openshift-dbaas-operator
         name: dbaas-operator-webhook-service
         path: /convert
         port: 443
     conversionReviewVersions:
       - v1alpha1
       - v1beta1

Expected results:

The `crd.spec.conversion.webhook.clientConfig` should instead retain the following settings.

$ oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}'
map[caBundle:LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJpRENDQVMyZ0F3SUJBZ0lJUVA1b1ZtYTNqUG93Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqQWVGdzB5TWpFeU1UWXhPVEEwTWpsYUZ3MHlOREV5TVRVeE9UQTBNamxhTUJneApGakFVQmdOVkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0d1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DCkFBVGcxaEtPWW40MStnTC9PdmVKT21jbkx5MzZNWTBEdnRGcXF3cjJFdlZhUWt2WnEzWG9ZeWlrdlFlQ29DZ3QKZ2VLK0UyaXIxNndzSmRSZ2paYnFHc3pGbzJFd1h6QU9CZ05WSFE4QkFmOEVCQU1DQW9Rd0hRWURWUjBsQkJZdwpGQVlJS3dZQkJRVUhBd0lHQ0NzR0FRVUZCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFCkZPMWNXNFBrbDZhcDdVTVR1UGNxZWhST1gzRHZNQW9HQ0NxR1NNNDlCQU1DQTBrQU1FWUNJUURxN0pkUjkxWlgKeWNKT0hyQTZrL0M0SG9sSjNwUUJ6bmx3V3FXektOd0xiZ0loQU5ObUd6RnBqaHd6WXpVY2RCQ3llU3lYYkp3SAphYllDUXFkSjBtUGFha28xCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K service:map[name:dbaas-operator-controller-manager-service namespace:redhat-dbaas-operator path:/convert port:443]]

 conversion:
   strategy: Webhook
   webhook:
     clientConfig:
       service:
         namespace: redhat-dbaas-operator
         name: dbaas-operator-controller-manager-service
         path: /convert
         port: 443
       caBundle: >-
         LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUJoekNDQVMyZ0F3SUJBZ0lJZXdhVHNLS0hhbWd3Q2dZSUtvWkl6ajBFQXdJd0dERVdNQlFHQTFVRUNoTU4KVW1Wa0lFaGhkQ3dnU1c1akxqQWVGdzB5TWpFeU1UWXhPVEF5TURkYUZ3MHlOREV5TVRVeE9UQXlNRGRhTUJneApGakFVQmdOVkJBb1REVkpsWkNCSVlYUXNJRWx1WXk0d1dUQVRCZ2NxaGtqT1BRSUJCZ2dxaGtqT1BRTUJCd05DCkFBUVRFQm8zb1BWcjRLemF3ZkE4MWtmaTBZQTJuVGRzU2RpMyt4d081ZmpKQTczdDQ2WVhOblFzTjNCMVBHM04KSXJ6N1dKVkJmVFFWMWI3TXE1anpySndTbzJFd1h6QU9CZ05WSFE4QkFmOEVCQU1DQW9Rd0hRWURWUjBsQkJZdwpGQVlJS3dZQkJRVUhBd0lHQ0NzR0FRVUZCd01CTUE4R0ExVWRFd0VCL3dRRk1BTUJBZjh3SFFZRFZSME9CQllFCkZJemdWbC9ZWkFWNmltdHl5b0ZkNFRkLzd0L3BNQW9HQ0NxR1NNNDlCQU1DQTBnQU1FVUNJRUY3ZXZ0RS95OFAKRnVrTUtGVlM1VkQ3a09DRzRkdFVVOGUyc1dsSTZlNEdBaUVBZ29aNmMvYnNpNEwwcUNrRmZSeXZHVkJRa25SRwp5SW1WSXlrbjhWWnNYcHM9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K

Additional info:

If the operator is, instead, installed as an upgrade... vs a fresh install... the webhook settings are properly/permanently set and everything works as expected. This can be tested in a fresh cluster like this.

1. oc apply -f https://gist.githubusercontent.com/tchughesiv/703109961f22ab379a45a401be0cf351/raw/2d0541b76876a468757269472e8e3a31b86b3c68/conversionWorksUpgrade.yaml
2. oc get crd dbaasproviders.dbaas.redhat.com --template '{{ .spec.conversion.webhook.clientConfig }}' -w

https://github.com/openshift/operator-framework-olm/pull/490

Bug OCPBUGS-14814: Update OWNERS and OWNERS_ALIASES in node-driver-registrar repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-node-driver-registrar/pull/47

Bug OCPBUGS-17424: console-operator should not panic when filtering tombstone informer events

View the Description View the linked PRs

Description of problem:

console-operator may panic when IncludeNamesFilter receives an object from a shared informer event of type cache.DeletedFinalStateUnknown.

Example job with panic: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960

Specific log that shows the full stack trace: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1687876857824808960/artifacts/e2e-aws-sdn-serial/gather-extra/artifacts/pods/openshift-console-operator_console-operator-748d7c6cdd-vwxmx_console-operator.log

Version-Release number of selected component (if applicable):

How reproducible:

Sporadically

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-25715: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver-operator/pull/60

Bug OCPBUGS-19397: baremetal 4.14.0-rc.0 ipv6 sno cluster, no Observe menu on admin console, monitoring-plugin is failed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19059~~. The following is the description of the original issue:
—
Description of problem:

baremetal 4.14.0-rc.0 ipv6 sno cluster, login as admin user to admin console, there is not Observe menu on the left navigation bar, see picture, https://drive.google.com/file/d/13RAXPxtKhAElN9xf8bAmLJa0GI8pP0fH/view?usp=sharing, monitoring-plugin status is Failed, see: https://drive.google.com/file/d/1YsSaGdLT4bMn-6E-WyFWbOpwvDY4t6na/view?usp=sharing, error is

Failed to get a valid plugin manifest from /api/plugins/monitoring-plugin/
r: Bad Gateway

checked console logs, 9443: connect: connection refused

$ oc -n openshift-console logs console-6869f8f4f4-56mbj
...
E0915 12:50:15.498589       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:15 http: panic serving [fd01:0:0:1::2]:39156: runtime error: invalid memory address or nil pointer dereference
goroutine 183760 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0003b5760, 0x2?, {0xc0009bc7d1, 0x11}, {0x3a41fa0, 0xc0002f6c40}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xaa00000000000010?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0002f6c40?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc0001f7500?, {0x3a41fa0?, 0xc0002f6c40?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7500)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0x5120938?, {0x3a41fa0?, 0xc0002f6c40?}, 0x7ffb6ea27f18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc001102c00?, {0x3a41fa0?, 0xc0002f6c40?}, 0xc000655a00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0002f6c40}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0002f6c40?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc0008201e0?}, {0x3a41fa0, 0xc0002f6c40}, 0xc0001f7400)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc0009b4120, {0x3a43e70, 0xc001223500})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed
I0915 12:50:24.267777       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
I0915 12:50:24.267813       1 handlers.go:118] User settings ConfigMap "user-settings-4b4c2f4d-159c-4358-bba3-3d87f113cd9b" already exist, will return existing data.
E0915 12:50:30.155515       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::f735]:9443: connect: connection refused
2023/09/15 12:50:30 http: panic serving [fd01:0:0:1::2]:42990: runtime error: invalid memory address or nil pointer dereference

9443 port is Connection refused

$ oc -n openshift-monitoring get pod -o wide
NAME                                                     READY   STATUS    RESTARTS   AGE     IP                  NODE    NOMINATED NODE   READINESS GATES
alertmanager-main-0                                      6/6     Running   6          3d22h   fd01:0:0:1::564     sno-2   <none>           <none>
cluster-monitoring-operator-6cb777d488-nnpmx             1/1     Running   4          7d16h   fd01:0:0:1::12      sno-2   <none>           <none>
kube-state-metrics-dc5f769bc-p97m7                       3/3     Running   12         7d16h   fd01:0:0:1::3b      sno-2   <none>           <none>
monitoring-plugin-85bfb98485-d4g5x                       1/1     Running   4          7d16h   fd01:0:0:1::55      sno-2   <none>           <none>
node-exporter-ndnnj                                      2/2     Running   8          7d16h   2620:52:0:165::41   sno-2   <none>           <none>
openshift-state-metrics-78df59b4d5-j6r5s                 3/3     Running   12         7d16h   fd01:0:0:1::3a      sno-2   <none>           <none>
prometheus-adapter-6f86f7d8f5-ttflf                      1/1     Running   0          4h23m   fd01:0:0:1::b10c    sno-2   <none>           <none>
prometheus-k8s-0                                         6/6     Running   6          3d22h   fd01:0:0:1::566     sno-2   <none>           <none>
prometheus-operator-7c94855989-csts2                     2/2     Running   8          7d16h   fd01:0:0:1::39      sno-2   <none>           <none>
prometheus-operator-admission-webhook-7bb64b88cd-bvq8m   1/1     Running   4          7d16h   fd01:0:0:1::37      sno-2   <none>           <none>
thanos-querier-5bbb764599-vlztq                          6/6     Running   6          3d22h   fd01:0:0:1::56a     sno-2   <none>           <none>

$  oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::f735   <none>        9443/TCP   7d16h


$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::f735...
* TCP_NODELAY set
* connect to fd02::f735 port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7

no such issue in other 4.14.0-rc.0 ipv4 cluster, but issue reproduced on other 4.14.0-rc.0 ipv6 cluster.
4.14.0-rc.0 ipv4 cluster,

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         20m     Cluster version is 4.14.0-rc.0

$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-85bfb98485-nh428                       1/1     Running   0          4m      10.128.0.107   ci-ln-pby4bj2-72292-l5q8v-master-0   <none>           <none>

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k  'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
...
{
  "name": "monitoring-plugin",
  "version": "1.0.0",
  "displayName": "OpenShift console monitoring plugin",
  "description": "This plugin adds the monitoring UI to the OpenShift web console",
  "dependencies": {
    "@console/pluginAPI": "*"
  },
  "extensions": [
    {
      "type": "console.page/route",
      "properties": {
        "exact": true,
        "path": "/monitoring",
        "component": {
          "$codeRef": "MonitoringUI"
        }
      }
    },
...

meet issue "9443: Connection refused" in 4.14.0-rc.0 ipv6 cluster(launched cluster-bot cluster: launch 4.14.0-rc.0 metal,ipv6) and login console

$ oc get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-rc.0   True        False         44m     Cluster version is 4.14.0-rc.0
$ oc -n openshift-monitoring get pod -o wide | grep monitoring-plugin
monitoring-plugin-bd6ffdb5d-b5csk                        1/1     Running   0          53m   fd01:0:0:4::b             worker-0.ostest.test.metalkube.org   <none>           <none>
monitoring-plugin-bd6ffdb5d-vhtpf                        1/1     Running   0          53m   fd01:0:0:5::9             worker-2.ostest.test.metalkube.org   <none>           <none>
$ oc -n openshift-monitoring get svc monitoring-plugin
NAME                TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
monitoring-plugin   ClusterIP   fd02::402d   <none>        9443/TCP   59m

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -v 'https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json' | jq
*   Trying fd02::402d...
* TCP_NODELAY set
* connect to fd02::402d port 9443 failed: Connection refused
* Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
* Closing connection 0
curl: (7) Failed to connect to monitoring-plugin.openshift-monitoring.svc.cluster.local port 9443: Connection refused
command terminated with exit code 7$ oc -n openshift-console get pod | grep console
console-5cffbc7964-7ljft     1/1     Running   0          56m
console-5cffbc7964-d864q     1/1     Running   0          56m$ oc -n openshift-console logs console-5cffbc7964-7ljft
...
E0916 14:34:16.330117       1 handlers.go:164] GET request for "monitoring-plugin" plugin failed: Get "https://monitoring-plugin.openshift-monitoring.svc.cluster.local:9443/plugin-manifest.json": dial tcp [fd02::402d]:9443: connect: connection refused
2023/09/16 14:34:16 http: panic serving [fd01:0:0:4::2]:37680: runtime error: invalid memory address or nil pointer dereference
goroutine 3985 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1854 +0xbf
panic({0x3259140, 0x4fcc150})
    /usr/lib/golang/src/runtime/panic.go:890 +0x263
github.com/openshift/console/pkg/plugins.(*PluginsHandler).proxyPluginRequest(0xc0008f6780, 0x2?, {0xc000665211, 0x11}, {0x3a41fa0, 0xc0009221c0}, 0xb?)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:165 +0x582
github.com/openshift/console/pkg/plugins.(*PluginsHandler).HandlePluginAssets(0xfe00000000000010?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/plugins/handlers.go:147 +0x26d
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func23({0x3a41fa0?, 0xc0009221c0?}, 0x7?)
    /go/src/github.com/openshift/console/pkg/server/server.go:604 +0x33
net/http.HandlerFunc.ServeHTTP(...)
    /usr/lib/golang/src/net/http/server.go:2122
github.com/openshift/console/pkg/server.authMiddleware.func1(0xc000d8d600?, {0x3a41fa0?, 0xc0009221c0?}, 0xd?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:25 +0x31
github.com/openshift/console/pkg/server.authMiddlewareWithUser.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d600)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:81 +0x46c
net/http.HandlerFunc.ServeHTTP(0xc000653830?, {0x3a41fa0?, 0xc0009221c0?}, 0x7f824506bf18?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.StripPrefix.func1({0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2165 +0x332
net/http.HandlerFunc.ServeHTTP(0xc00007e800?, {0x3a41fa0?, 0xc0009221c0?}, 0xc000b2da00?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.(*ServeMux).ServeHTTP(0x34025e0?, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2500 +0x149
github.com/openshift/console/pkg/server.securityHeadersMiddleware.func1({0x3a41fa0, 0xc0009221c0}, 0x3305040?)
    /go/src/github.com/openshift/console/pkg/server/middleware.go:128 +0x3af
net/http.HandlerFunc.ServeHTTP(0x0?, {0x3a41fa0?, 0xc0009221c0?}, 0x11db52e?)
    /usr/lib/golang/src/net/http/server.go:2122 +0x2f
net/http.serverHandler.ServeHTTP({0xc000db9b00?}, {0x3a41fa0, 0xc0009221c0}, 0xc000d8d500)
    /usr/lib/golang/src/net/http/server.go:2936 +0x316
net/http.(*conn).serve(0xc000653680, {0x3a43e70, 0xc000676f30})
    /usr/lib/golang/src/net/http/server.go:1995 +0x612
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3089 +0x5ed

Version-Release number of selected component (if applicable):

baremetal 4.14.0-rc.0 ipv6 sno cluster,
$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=virt_platform'  | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "virt_platform",
          "baseboard_manufacturer": "Dell Inc.",
          "baseboard_product_name": "01J4WF",
          "bios_vendor": "Dell Inc.",
          "bios_version": "1.10.2",
          "container": "kube-rbac-proxy",
          "endpoint": "https",
          "instance": "sno-2",
          "job": "node-exporter",
          "namespace": "openshift-monitoring",
          "pod": "node-exporter-ndnnj",
          "prometheus": "openshift-monitoring/k8s",
          "service": "node-exporter",
          "system_manufacturer": "Dell Inc.",
          "system_product_name": "PowerEdge R750",
          "system_version": "Not Specified",
          "type": "none"
        },
        "value": [
          1694785092.664,
          "1"
        ]
      }
    ]
  }
}

How reproducible:

only seen on this cluster

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

no Observe menu on admin console, monitoring-plugin is failed

Expected results:

no error

https://github.com/openshift/cluster-monitoring-operator/pull/2091

Bug OCPBUGS-19910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1127

Bug OCPBUGS-17872: Azure MAO CredentialsRequest contains unnecessary network write permissions

View the Description View the linked PRs

Description of problem:

CredentialsRequest for Azure AD Workload Identity contains unnecessary network permissions.

- Microsoft.Network/applicationSecurityGroups/delete
- Microsoft.Network/applicationSecurityGroups/write
- Microsoft.Network/loadBalancers/delete
- Microsoft.Network/networkSecurityGroups/delete
- Microsoft.Network/routeTables/delete
- Microsoft.Network/routeTables/write
- Microsoft.Network/virtualNetworks/subnets/delete
- Microsoft.Network/virtualNetworks/subnets/write
- Microsoft.Network/virtualNetworks/write
- Microsoft.Resources/subscriptions/resourceGroups/delete
- Microsoft.Resources/subscriptions/resourceGroups/write

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

N/A

Steps to Reproduce:

1. Remove above permissions from the Azure Credentials request and validate that MAO continues to function in Azure AD Workload Identity cluster.

Actual results:

Unnecessary network write permissions enumerated in CredentialsRequest.

Expected results:

Only necessary permissions enumerated in CredentialsRequest.

Additional info:

Additional unnecessary permissions will be hard to pin point but these specific permissions were questioned by MSFT and are likely only needed by the installer as output by CORS-1870 investigation.

https://github.com/openshift/machine-api-operator/pull/1161

Bug OCPBUGS-18399: OPENSHIFT_IMG_OVERRIDES is not retaining the mirroring order from ICSP/IDMS

View the Description View the linked PRs

Description of problem:

The environment variable OPENSHIFT_IMG_OVERRIDES is not retaining the order of mirrors listed under a source compared to the original mirror/source listing in the ICSP/IDMSs.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Setup a mgmt cluster with either an ICSP like:

  apiVersion: operator.openshift.io/v1alpha1
  kind: ImageContentSourcePolicy
  metadata:
    name: image-policy-39
  spec:
    repositoryDigestMirrors:
    - mirrors:
      - quay.io/openshift-release-dev/ocp-release
      - pull.q1w2.quay.rhcloud.com/openshift-release-dev/ocp-release
      source: quay.io/openshift-release-dev/ocp-release

2. Create a Hosted Cluster

Actual results:

Nodes cannot join the cluster because ignition cannot be generated

Expected results:

Nodes can join the cluster

Additional info:

Issue is most likely coming from here - https://github.com/openshift/hypershift/blob/dce6f51355317173be6bc525edfe059572c07690/support/util/util.go#L224

https://github.com/openshift/hypershift/pull/2977

Bug OCPBUGS-21057: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver-operator/pull/280

Bug OCPBUGS-7091: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-10306: [vSphere]vSphere Upi installation failed due to VMs for master and worker node creation failed.

View the Description View the linked PRs

Description of problem:

Terraform will not create VMs for master and worker for upi vsphere when unset var.control_plane_ip_addresses and var.compute_ip_addresses. When users are using IPAM (as before) to reserve IPs instead of setting static IPs directly into var.control_plane_ip_addresses and var.compute_ip_addresses, Based on upstream code #1 and #2. The count of master and worker is always 0, then terraform will not create any VMs for master and worker nodes. If we changed code as below, it works in IPAM case as before.  
control_plane_fqdns = [for idx in range(length(var.control_plane_ip_addresses)) : "control-plane-${idx}.${var.cluster_domain}"]  
compute_fqdns = [for idx in range(length(var.compute_ip_addresses)) : "compute-${idx}.${var.cluster_domain}"] ==>>
control_plane_fqdns = [for idx in range(var.control_plane_count) : "control-plane-${idx}.${var.cluster_domain}"]
compute_fqdns = [for idx in range(var.compute_count) : "compute-${idx}.${var.cluster_domain}"]

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-11-033820

How reproducible:

always

Steps to Reproduce:

1.Trigger job to install a cluster on vSphere with upi.
2.If the ip applied for master and worker VMs from IPAM server instead of setting the static ip directly into var.control_plane_ip_addresses and var.compute_ip_addresses, the VM creation will fail.

Actual results:

the VM creation will fail

Expected results:

VM creation succeeds.

Additional info:

#1 link:https://github.com/openshift/installer/blob/master/upi/vsphere/main.tf#L15-L16
#2 link:https://github.com/openshift/installer/blob/master/upi/vsphere/main.tf#L211
This bug will only affect UPI vSphere installation when user use IPAM server to reserve static IPs instead of setting static ip directly into var.control_plane_ip_addresses and var.compute_ip_addresses. now it don't affect QE test, because we still install with previous code.

https://github.com/openshift/installer/pull/6999

Bug OCPBUGS-13034: Cluster-api SA can't create events

View the Description View the linked PRs

Description of problem:

Cluster-api pod can't create events due to RBAC. we may miss some useful event due to this.

E0503 07:20:44.925786       1 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ad1-workers-f5f568855-vnzmn.175b911e43aa3f41", GenerateName:"", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Machine", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", Name:"ad1-workers-f5f568855-vnzmn", UID:"2b40a694-d36d-4b13-9afc-0b5daeecc509", APIVersion:"cluster.x-k8s.io/v1beta1", ResourceVersion:"144260357", FieldPath:""}, Reason:"DetectedUnhealthy", Message:"Machine ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1/ad1-workers/ad1-workers-f5f568855-vnzmn/ has unhealthy node ", Source:v1.EventSource{Component:"machinehealthcheck-controller", Host:""}, FirstTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), LastTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1:cluster-api" cannot create resource "events" in API group "" in the namespace "ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1"' (will not retry!)

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create an hosted cluster
2. Check cluster-api pod for some kind of error (e.g. slow node startup)
3.

Actual results:

Error

Expected results:

Event generated

Additional info:
ClusterRole hypershift-cluster-api is created here https://github.com/openshift/hypershift/blob/e7eb32f259b2a01e5bbdddf2fe963b82b331180f/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L2720

We should add create/patch/update for events there

https://github.com/openshift/hypershift/pull/2565

Bug OCPBUGS-19701: Remove dependency on k8s.io/kubernetes packages

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18906~~. The following is the description of the original issue:
—
Using packages from k8s.io/kubernetes is not supported: https://github.com/kubernetes/kubernetes/issues/79384#issuecomment-505627280

This came about in this slack thread: https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1694210392218409?thread_ts=1694207119.447459&cid=C02CZNQHGN8

https://github.com/openshift/machine-config-operator/pull/3940

Bug OCPBUGS-12153: OLM CatalogSources in guest cluster cannot pull images if pre-GA

View the Description View the linked PRs

Description of problem:

When HyperShift HostedClusters are created with "OLMCatalogPlacement" set to "guest" and if the desired release is pre-GA, the CatalogSource pods cannot pull their images due to using unreleased images.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Common

Steps to Reproduce:

1. Create a HyperShift 4.13 HostedCluster with spec.OLMCatalogPlacement = "guest"
2. See the openshift-marketplace/community-operator-* pods in the guest cluster in ImagePullBackoff

Actual results:

openshift-marketplace/community-operator-* pods in the guest cluster in ImagePullBackoff

Expected results:

All CatalogSource pods to be running and to use n-1 images if pre-GA

Additional info:

https://github.com/openshift/hypershift/pull/2454

Bug OCPBUGS-15657: [azurefile-csi-driver] Track selectRandomMatchingAccount issue

View the Description View the linked PRs

Description of problem:

This Jira is filed to track upstream issue (fix and backport) https://github.com/kubernetes-sigs/azurefile-csi-driver/issues/1308

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/azure-file-csi-driver/pull/31

Bug OCPBUGS-18338: Tests failing in CI since base image changes

View the Description View the linked PRs

Description of problem:

Tests like lint and vet used to be ran within a container engine by
default if an engine was detected, both locally and in CI.Up until now no container engine was detected in CI, so tests would run natively there.Now that the base image we use in CI has now started
shipping `podman`, a container engine is detected by default and tests
are run within podman by default. But creating nested containers doesn't
work in CI at the moment and thus results in a test failure.As such we are switching the default behaviour for tests (both locally
and in CI), where now by
default no container engine is used to run tests, even if one is
detected, but instead tests are run natively unless otherwise specifi

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-27369: Bump to kubernetes 1.27.10

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.10:

Changelog:
v1.27.10: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1279

https://github.com/openshift/kubernetes/pull/1860

Bug OCPBUGS-7546: Default Router PDB Allows 2 Disruptions with 3 Replicas

View the Description View the linked PRs

Description of problem:

maxUnavailable defaults to 50% for anything under 4: https://github.com/openshift/cluster-ingress-operator/blob/master/pkg/operator/controller/ingress/poddisruptionbudget.go#L71

Based on PDB rounding logic, it always rounds to the next while integer, so 1.5 becomes 2.

spec:
  maxUnavailable: 50%
  selector:
    matchLabels:
      ingresscontroller.operator.openshift.io/deployment-ingresscontroller: default
  currentHealthy: 3
  desiredHealthy: 1
  disruptionsAllowed: 2

Where as with 4 router pods, we only allow 1 of 4 to be disrupted at a time.

Version-Release number of selected component (if applicable):

4.x

How reproducible:

Always

Steps to Reproduce:

1. Set 3 replicas
2. Look at the disruptionsAllowed on the PDB

Actual results:

You can take down 2 of 3 routers at once, leaving no HA.

Expected results:

With 3+ routers, we should always ensure 2 are up with the PDB.

Additional info:

Reduce the maxUnavailable to 25% for >= 3 pods instead of 4

https://github.com/openshift/cluster-ingress-operator/pull/931

Bug MGMT-13111: [Staging] - Deleting host from a cluster --> Host registers again after 15 mins without rebooting it

View the Description View the linked PRs

Description of the problem:

In Staging, deleting host in UI {}> Host re{-} register after ~15 mins

How reproducible:

100%

Steps to reproduce:

1. Before cluster installation, delete random host using UI

2. Wait 15 mins

3. Host re-register without rebooting

Actual results:

Agent automatically register himself after 15 min

Expected results:

Agent should register again after reboot

https://github.com/openshift/assisted-installer-agent/pull/583

Bug OCPBUGS-11928: thanos-sidecar panicking on start (incompatible with go1.20)

View the Description View the linked PRs

Description of problem:

thanos-sidecar is panicking after the image was rebuilt in this payload https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-04-18-045408


Example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm/1648276769645531136

Logs:
  - containerID: cri-o://c62dcc73b8203bfd968ffca95bba8607e24a06492948a0179cde6a57a897d431
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a007b49153ee517ab4fe0600d217832bac0fd6152b5a709da291b60c82a5875d
    imageID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a007b49153ee517ab4fe0600d217832bac0fd6152b5a709da291b60c82a5875d
    lastState:
      terminated:
        containerID: cri-o://c62dcc73b8203bfd968ffca95bba8607e24a06492948a0179cde6a57a897d431
        exitCode: 2
        finishedAt: '2023-04-18T12:30:20Z'
        message: "panic: Something in this program imports go4.org/unsafe/assume-no-moving-gc\
          \ to declare that it assumes a non-moving garbage collector, but your version\
          \ of go4.org/unsafe/assume-no-moving-gc hasn't been updated to assert that\
          \ it's safe against the go1.20 runtime. If you want to risk it, run with\
          \ environment variable ASSUME_NO_MOVING_GC_UNSAFE_RISK_IT_WITH=go1.20 set.\
          \ Notably, if go1.20 adds a moving garbage collector, this program is unsafe\
          \ to use.\n\ngoroutine 1 [running]:\ngo4.org/unsafe/assume-no-moving-gc.init.0()\n\
          \t/go/src/github.com/improbable-eng/thanos/vendor/go4.org/unsafe/assume-no-moving-gc/untested.go:25\
          \ +0x1ba\n"
        reason: Error
        startedAt: '2023-04-18T12:30:20Z'
    name: thanos-sidecar
    ready: false
    restartCount: 14
    started: false
    state:
      waiting:
        message: back-off 5m0s restarting failed container=thanos-sidecar pod=prometheus-k8s-0_openshift-monitoring(bafeb85b-3980-4153-90bc-a302b93c3465)
        reason: CrashLoopBackOff

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-18-045408

How reproducible:

Always

Steps to Reproduce:

1. Install 4.14.0-0.nightly-2023-04-18-045408

Actual results:

thanos-sidecar panics and cluster doesn't install

Expected results:

Additional info:

https://github.com/openshift/thanos/pull/106

Bug OCPBUGS-14484: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/347

Bug OCPBUGS-22676: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3143

Task MGMT-14425: Improving the severity counters in the events API

View the Description View the linked PRs

Description of the problem:

As discussed on the Github PR, we want to align the severities filter with the previous implementation. Therefore the severity counts in the response headers should be:

the total counts of events with the respective severity across all possible pages
with regards to the applied filters (hosts, cluster-level, message,...)
but they should not take the severities filter itself into account.

In addition to that, we need a new response header with a total number of events with all current filters (severities included) applied.

https://github.com/openshift/assisted-service/pull/5186

Bug OCPBUGS-11112: openshift-manila-csi-driver is missing the workload.openshift.io/allowed label

View the Description View the linked PRs

Description of problem: The openshift-manila-csi-driver namespace should have the "workload.openshift.io/allowed= management" label.

This is currently not the case:

❯ oc describe ns openshift-manila-csi-driver  
Name:         openshift-manila-csi-driver
Labels:       kubernetes.io/metadata.name=openshift-manila-csi-driver
              pod-security.kubernetes.io/audit=privileged
              pod-security.kubernetes.io/enforce=privileged
              pod-security.kubernetes.io/warn=privileged
Annotations:  include.release.openshift.io/self-managed-high-availability: true
              openshift.io/node-selector: 
              openshift.io/sa.scc.mcs: s0:c24,c4
              openshift.io/sa.scc.supplemental-groups: 1000560000/10000
              openshift.io/sa.scc.uid-range: 1000560000/10000
Status:       Active

No resource quota.

No LimitRange resource.

It is causing CI jobs to fail with:

{  fail [github.com/openshift/origin/test/extended/cpu_partitioning/platform.go:82]: projects [openshift-manila-csi-driver] do not contain the annotation map[workload.openshift.io/allowed:management]
Expected
    <[]string | len:1, cap:1>: [
        "openshift-manila-csi-driver",
    ]
to be empty
Ginkgo exit error 1: exit with code 1}

For instance https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27831/pull-ci-openshift-origin-release-4.13-e2e-openstack-ovn/1641317874201006080.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/353

Bug OCPBUGS-11751: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console-operator/pull/763

Bug OCPBUGS-12896: Route Checkbox getting checked even if it is unchecked during editing the Serverless Function form

View the Description View the linked PRs

Description of problem:

Route Checkbox getting checked even if it is unchecked during editing the Serverless Function form.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install Serverless Operator and Create KN Serving Instance
2. Create a Serverless Function and open the Edit form of the KSVC
3. Uncheck the Create Route option and save.
4. Reopen the Edit form again.

Actual results:

The checkbox still shows checked.

Expected results:

It should retain the previous condtion.

Additional info:

https://github.com/openshift/console/pull/12834

Bug OCPBUGS-14138: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13078

Bug OCPBUGS-13815: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/162

Bug OCPBUGS-9835: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2012

Bug OCPBUGS-15497: Can't use git lfs in BuildConfig git source with strategy Docker

View the Description View the linked PRs

I am using a BuildConfig with git source and the Docker strategy. The git repo contains a large zip file via LFS and that zip file is not getting downloaded. Instead just the ascii metadata is getting downloaded. I've created a simple reproducer (https://github.com/selrahal/buildconfig-git-lfs) on my personal github. If you clone the repo

git clone git@github.com:selrahal/buildconfig-git-lfs.git

and apply the bc.yaml file with

oc apply -f bc.yaml

Then start the build with

oc start-build test-git-lfs

You will see the build fails at the unzip step in the docker file

STEP 3/7: RUN unzip migrationtoolkit-mta-cli-5.3.0-offline.zip
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.

I've attached the full build logs to this issue.

https://github.com/openshift/builder/pull/350

Bug OCPBUGS-15877: LatencySensitive featureset must be removed

View the Description View the linked PRs

LatencySensitive has been functionally equivalent to "" (Default) for several years. Code has forgotten that the featureset must be handled and its more efficacious to remove the featureset (with migration code) than try to plug all the holes.

To ensure this is working, update a cluster to use LatencySensitve and see that the FEatureSet value is reset after two minutes

https://github.com/openshift/cluster-config-operator/pull/325

Bug OCPBUGS-10147: Update 4.14 ose-aws-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-aws/pull/37

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-aws/pull/37

Bug OCPBUGS-17410: Application group can not be deleted

View the Description View the linked PRs

Description of problem:

Application groups can not be deleted in topology

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create an application with an application group
2. Go to topology 
3. Delete the application group containing the application

Actual results:

Application group persists in topology

Expected results:

The application group should be deleted

Additional info:

Pipeline API is giving 404 even if the pipelines operator is not installed

https://github.com/openshift/console/pull/13074

Bug OCPBUGS-26003: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/905

Bug OCPBUGS-12506: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus-alertmanager/pull/71

Bug OCPBUGS-17309: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/43

Bug OCPBUGS-18997: "Create StorageClass" form breaks when a dynamic provisioner is selected

View the Description View the linked PRs

Description of problem:

Please check: https://issues.redhat.com/browse/OCPBUGS-18702?focusedId=23021716&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-23021716 for more details.

https://drive.google.com/drive/folders/14aSJs-lO6HC-2xYFlOTJtCZIQg3ekE85?usp=sharing (plz check recording "sc_form_typeerror.mp4").

Issues:
1. TypeError mentioned above.
2. Default params added by an extension are not getting added to the created StorageClass.
3. Validation for parameters added by an extension in not working correctly as well.
4. The Provisioner child details will be stuck once user selected 'openshift-storage.cephfs.csi.ceph.com'.

Version-Release number of selected component (if applicable):

4.14 (OCP)

How reproducible:

Steps to Reproduce:

1. Install ODF operator.
2. Create StorageSystem (once dynamic plugin is loaded).
3. Wait for a while for ODF related StorageClasses gets created.
4. Once they are created, go to "Create StorageSystem" form.
5. Switch to provisioners (rbd.csi.ceph) added by ODF dynamic plugin.

Actual results:

Page breaks with an error.

Expected results:

Page should not break.
And functionality should be how it was acting before the refactoring introduced by PR: https://github.com/openshift/console/pull/13036

Additional info:

Stack trace:
Caught error in a child component: TypeError: Cannot read properties of undefined (reading 'parameters')
    at allRequiredFieldsFilled (storage-class-form.tsx:204:1)
    at validateForm (storage-class-form.tsx:235:1)
    at storage-class-form.tsx:262:1
    at invokePassiveEffectCreate (react-dom.development.js:23487:1)
    at HTMLUnknownElement.callCallback (react-dom.development.js:3945:1)
    at Object.invokeGuardedCallbackDev (react-dom.development.js:3994:1)
    at invokeGuardedCallback (react-dom.development.js:4056:1)
    at flushPassiveEffectsImpl (react-dom.development.js:23574:1)
    at unstable_runWithPriority (scheduler.development.js:646:1)
    at runWithPriority$1 (react-dom.development.js:11276:1) {componentStack: '\n    at StorageClassFormInner (http://localhost:90...c03030668ef271da51f.js:491534:20)\n    at Suspense'}

https://github.com/openshift/console/pull/13170

Bug OCPBUGS-22275: Backport vendor runtime-utils to support both ICSP and IDMS objects

View the Description View the linked PRs

Description of problem:

This Bug is for ~~OCPNODE-1799~~ backport to 4.14.z

backport https://github.com/openshift/machine-config-operator/pull/3898

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3995

Bug OCPBUGS-16614: Remove techpreview for sts-enablement in the API infra

View the Description View the linked PRs

Description of problem:

STS cluster awareness was in techpreview for testing and assurance of quality before release. The created unit tests and runs have indicated no change in operation to the cluster. QE has reported several bugs and they've been fixed. A periodic e2e test to verify that when an STS cluster is detected and proper AWS resource access tokens are present in the CredentialsRequest a Secret is generated has been passing and has passed when run manually on several follow-on PRs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-21468: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/206

Bug OCPBUGS-22702: Flaky debug pod return code

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20342~~. The following is the description of the original issue:
—
Description of problem:

As a part of the forbidden node label e2e test, we execute `oc debug` command to set the forbidden labels on the node. The `oc debug` command is expected to fail while applying the forbidden label.

In our testing, we observed that even though the actual command on the node (kubectl label node/<node> <forbidden_label>) expectedly fails, the `oc debug` command does not carry the return code correctly (it will return 0, even though `kubectl label` fails with error).

Version-Release number of selected component (if applicable):

4.14

How reproducible:

flaky

Steps to Reproduce:

1. Run the test at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45
2. Observe that sometimes it flakes at https://gist.github.com/harche/c9143c382cfe94d7836414d5ccc0ba45#file-test-go-L39

Actual results:

oc debug return value flakes

Expected results:

oc debug return value should be consistent.

Additional info:

https://github.com/openshift/oc/pull/1592

Bug OCPBUGS-4465: Error should be logged and stop the customized image generation if the nmstate output is "--- {}\n"

View the Description View the linked PRs

Description of problem:

When we exapnd the baremetal IP cluster with static IP, no information is logged if nmstate output is "--- {}\n" and the customized image generates without the static network configuration.

Version-Release number of selected component (if applicable):

4.11

How reproducible:

100%

Steps to Reproduce:

1. Exapand baremetal ipi cluster node with the below invalid nmstate data.
   ---
   apiVersion: v1
   kind: Secret
   metadata:
    name: openshift-worker-0-network-config-secret
   type: Opaque
   stringData:
    nmstate: |
     foo:
      bar: baz
   ---
   apiVersion: v1
   kind: Secret
   metadata:
     name: openshift-worker-0-bmc-secret
     namespace: openshift-machine-api
   type: Opaque
   data:
     username: YWRtaW4K
     password: cGFzc3dvcmQK
   ---
   apiVersion: metal3.io/v1alpha1
   kind: BareMetalHost
   metadata:
     name: openshift-worker-0
     namespace: openshift-machine-api
   spec:
     online: True
     bootMACAddress: 52:54:00:11:22:b4
     bmc:
       address: ipmi://192.168.123.1:6233
       credentialsName: openshift-worker-0-bmc-secret
       disableCertificateVerification: True
       username: admin
       password: password
     rootDeviceHints:
       deviceName: "/dev/sda"
     preprovisioningNetworkDataName: openshift-worker-0-network-config-secret

2. Check if an IP is configured with the node
3.

Actual results:

No static network configuration in the metal3 customized image.

Expected results:

Information should be logged and the metal3 customized image should not be generated.

Additional info:

https://github.com/openshift/image-customization-controller/pull/72

https://github.com/openshift/image-customization-controller/pull/72

Bug OCPBUGS-7036: Add Git Repository (PAC) doesn't setup GitLab and Bitbucket configuration correct

View the Description View the linked PRs

Description of problem:
When adding a "Git Repository" (a tekton or pipelines Repository) and enter a GitLab or Bitbucket PAC repository the created Repository resource is invalid.

Version-Release number of selected component (if applicable):
411-4.13

How reproducible:
Always

Steps to Reproduce:
Setup a PAC git repo, you can mirror these projects if you want: https://github.com/jerolimov/nodeinfo-pac

For GitHub you need setup

an account-global "private access token" > a classic access token, see https://github.com/settings/tokens
a repo > webhook

For GitLab:

a repo > Project Access Tokens
a repo > webhook

For Bitbucket:

an account-global "app password, see https://bitbucket.org/account/settings/app-passwords/
a repo > webhook

On a cluster bot instance:

Install OpenShift Pipelines operator
Navigate to Developer perspective > Pipelines
Select Create > Repository
Enter a GitLab based git repository with Git access token and Webhook secret
Enter a Bitbucket based git repository with Git access token (webhook secret isn't supported)

Actual results:
The GitLab created resource looks like this:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: gitlab-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: gitlab-nodeinfo-pac-token-gfr66
    url: gitlab.com   # missing schema
    webhook_secret:
      key: webhook.secret
      name: gitlab-nodeinfo-pac-token-gfr66
  url: 'https://gitlab.com/jerolimov/nodeinfo-pac'

The Bitbucket resource looks like this:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: bitbucket-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: bitbucket-nodeinfo-pac-token-9pf75
    url: bitbucket.org   # missing schema and invalid API URL !
    webhook_secret:   # don't entered a webhook URL, see OCPBUGS-7035
      key: webhook.secret
      name: bitbucket-nodeinfo-pac-token-9pf75
  url: 'https://bitbucket.org/jerolimov/nodeinfo-pac'

The pipeline-as-code controller Pod log contains some error messages and no PipelineRun is created.

Expected results:
For GitLab:

The spec.git_provider.url should contain the schema https://, so it should be https://gitlab.com, or can be removed completely. Both work fine.
A working example:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: gitlab-nodeinfo-pac
spec:
  git_provider:
    secret:
      key: provider.token
      name: gitlab-nodeinfo-pac-token-gfr66
    url: https://gitlab.com
    webhook_secret:
      key: webhook.secret
      name: gitlab-nodeinfo-pac-token-gfr66
  url: 'https://gitlab.com/jerolimov/nodeinfo-pac'

Bitbucket:

The spec.git_provider.url should be https://api.bitbucket.org/2.0, or can be removed completely. Both work fine.
The Account Secret needs also a Bitbucket login name, passed as spec.git_provider.user.

A working example:

apiVersion: pipelinesascode.tekton.dev/v1alpha1
kind: Repository
metadata:
  name: bitbucket-nodeinfo-pac
spec:
  git_provider:
    user: jerolimov
    secret:
      key: provider.token
      name: bitbucket-nodeinfo-pac-token-9pf75
    webhook_secret:
      key: webhook.secret
      name: bitbucket-nodeinfo-pac-token-9pf75
  url: 'https://bitbucket.org/jerolimov/nodeinfo-pac'

A PipelineRun should be created for each push to the git repo.

Additional info:

Bitbucket use a small 2nd b.
For the Bitbucket issue see also https://github.com/openshift-pipelines/pipelines-as-code/issues/416

https://github.com/openshift/console/pull/12593

Bug OCPBUGS-23354: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/421

Bug OCPBUGS-10170: Update 4.14 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/453

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/453

Bug OCPBUGS-12260: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/148

Bug OCPBUGS-14988: CNO rebase to kube 1.27

View the Description View the linked PRs

Should not need any special QA.

https://github.com/openshift/cluster-network-operator/pull/1826

Bug OCPBUGS-19379: Intermittent 504 Gateway Time-out

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18999~~. The following is the description of the original issue:
—
Description of problem:

Image pulls fail with http status 504, gateway timeout until image registry pods are restarted.

Version-Release number of selected component (if applicable):

4.13.12

How reproducible:

Intermittent

Steps to Reproduce:

1.
2.
3.

Actual results:

Images can't be pulled: 
podman pull registry.ci.openshift.org/ci/applyconfig:latest Trying to pull registry.ci.openshift.org/ci/applyconfig:latest... Getting image source signatures Error: reading signatures: downloading signatures for sha256:83c1b636069c3302f5ba5075ceeca5c4a271767900fee06b919efc3c8fa14984 in registry.ci.openshift.org/ci/applyconfig: received unexpected HTTP status: 504 Gateway Time-out


Image registry pods contain errors:
time="2023-09-01T02:25:39.596485238Z" level=warning msg="error authorizing context: access denied" go.version="go1.19.10 X:strictfipsruntime" http.request.host=registry.ci.openshift.org http.request.id=3e805818-515d-443f-8d9b-04667986611d http.request.method=GET http.request.remoteaddr=18.218.67.82 http.request.uri="/v2/ocp/4-dev-preview/manifests/sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0" http.request.useragent="containers/5.24.1 (github.com/containers/image)" vars.name=ocp/4-dev-preview vars.reference="sha256:caf073ce29232978c331d421c06ca5c2736ce5461962775fdd760b05fb2496a0"

Expected results:

Image registry does not return gateway timeouts

Additional info:

Must gather(s) attached, additional information in linked OHSS ticket.

https://github.com/openshift/image-registry/pull/381

Bug OCPBUGS-23142: Cluster Network Operator needs additional RBAC permission to deploy network-node-identity when Calico is the network type

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23083~~. The following is the description of the original issue:
—
Description of problem:

When the network type is Calico for a hosted cluster, the rbac policies that are laid down for CNO do not include permissions to deploy network-node-identity

Version-Release number of selected component (if applicable):

How reproducible: IBM Satellite environment

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3182

Bug OCPBUGS-8646: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/462

Bug OCPBUGS-10146: Update 4.14 coredns image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/coredns/pull/89

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/coredns/pull/89

Bug OCPBUGS-13939: Prometheus remote write tests are flaky

View the Description View the linked PRs

Description of problem:

The test TestPrometheusRemoteWrite/assert_remote_write_cluster_id_relabel_config_works is flaky and keeps blocking PR merges. After investigation it seems like the timeout to wait for the expected value is simply to short.

https://github.com/openshift/cluster-monitoring-operator/pull/1971

Bug OCPBUGS-5233: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-server/pull/128

Bug OCPBUGS-9305: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12685

Bug OCPBUGS-13124: Forced BMH reboot fails when image URL has changed

View the Description View the linked PRs

Description of problem:

When forcing a reboot of a BMH with the annotation  reboot.metal3.io: '{"force": true}' with a new preprovisioningimage URL the host never reboots.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-05-03-150228

How reproducible:

100%

Steps to Reproduce:

1. Create a BMH and stall the provisioning process at "provisioning"
2. Set a new URL in the preprovisioningimage
3. Set the force reboot annotation on the BMH (reboot.metal3.io: '{"force": true}')

Actual results:

Host does not reboot and the annotation remains on the BMH

Expected results:

Host reboots into the new image

Additional info:

This was reproduced using assisted installer (MCE central infrastructure management)

https://github.com/openshift/baremetal-operator/pull/276

Bug OCPBUGS-13897: Use cloud-config-operator to render FeatureGate CR is guest cluster

View the Description View the linked PRs

The dev workflow for OCP operators wanting to use feature gates is

1) change openshift/api
2) bump openshift/api in cluster-config-operator (CCO)
3) bump openshift/api in your operator and add logic for the feature gate

Currently, hypershift requires its bump to openshift/api in order to set the proper feature gates and this is not preferred. It is preferred that the single place where a api bump is required is cluster-config-operator.

Hypershift should use CCO `render` command to generate the FeatureGate CR

https://github.com/openshift/hypershift/pull/2585

Bug OCPBUGS-17770: Bootstrap is provisioned failed from azure marketplace image when pulisher is not matched with image plan

View the Description View the linked PRs

Description of problem:

Install IPI cluster where all nodes are provisioned from azure marketplace image with purchase plan.

install-config.yaml:
---------------------------
platform:
  azure:
    region: eastus
    baseDomainResourceGroupName: os4-common
    defaultMachinePlatform:
      osImage:
        publisher: Redhat  <----  contains uppercase letter
        offer: rh-ocp-worker
        sku: rh-ocp-worker
        version: 4.8.2021122100
        plan: WithPurchasePlan

as some marketplace images are free without plan, so pulisher in install-config should come from output of `az vm image list`

# az vm image list --offer rh-ocp-worker --all -otable
Architecture    Offer          Publisher       Sku                 Urn                                                             Version
--------------  -------------  --------------  ------------------  --------------------------------------------------------------  --------------
x64             rh-ocp-worker  redhat-limited  rh-ocp-worker       redhat-limited:rh-ocp-worker:rh-ocp-worker:4.8.2021122100       4.8.2021122100
x64             rh-ocp-worker  RedHat          rh-ocp-worker       RedHat:rh-ocp-worker:rh-ocp-worker:4.8.2021122100               4.8.2021122100
x64             rh-ocp-worker  redhat-limited  rh-ocp-worker-gen1  redhat-limited:rh-ocp-worker:rh-ocp-worker-gen1:4.8.2021122100  4.8.2021122100
x64             rh-ocp-worker  RedHat          rh-ocp-worker-gen1  RedHat:rh-ocp-worker:rh-ocp-worker-gen1:4.8.2021122100          4.8.2021122100

the image plan is as below, its publisher is lowercase.
# az vm image show --urn RedHat:rh-ocp-worker:rh-ocp-worker:4.8.2021122100 --query plan
{
  "name": "rh-ocp-worker",
  "product": "rh-ocp-worker",
  "publisher": "redhat"
}

From installer https://github.com/openshift/installer/blob/master/data/data/azure/bootstrap/main.tf#L243-L246, publisher property in image plan is from pulisher what we set in install-config.yaml, installer should use the publisher property from image plan output.

But image plan is case-sensitive, bootstrap instance is provisioned failed with below error in such case.

Unable to deploy from the Marketplace image or a custom image sourced from Marketplace image. The part number in the purchase information for VM '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/jima15image1-flg24-rg/providers/Microsoft.Compute/virtualMachines/jima15image1-flg24-bootstrap' is not as expected. Beware that the Plan object's properties are case-sensitive. Learn more about common virtual machine error codes.

similar errors when provisioning worker instances from this image where image publisher contains upper case but publisher in its plan is all lowercase.

worker machineset:
----------------------------
Spec:
  Lifecycle Hooks:
  Metadata:
  Provider ID:  azure:///subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/ci-op-cc5g2rw8-55267-q66k7-rg/providers/Microsoft.Compute/virtualMachines/ci-op-cc5g2rw8-55267-q66k7-worker-southcentralus1-dq6sp
  Provider Spec:
    Value:
      Accelerated Networking:  true
      API Version:             machine.openshift.io/v1beta1
      Credentials Secret:
        Name:       azure-cloud-credentials
        Namespace:  openshift-machine-api
      Diagnostics:
        Boot:
          Storage Account Type:  AzureManaged
      Image:
        Offer:           rh-ocp-worker
        Publisher:       RedHat
        Resource ID:     
        Sku:             rh-ocp-worker
        Type:            WithPurchasePlan
        Version:         4.8.2021122100
      Kind:              AzureMachineProviderSpec
      Location:          southcentralus
      Managed Identity:  ci-op-cc5g2rw8-55267-q66k7-identity

error when provision worker instance:
Unable to deploy from the Marketplace image or a custom image sourced from Marketplace image. The part number in the purchase information for VM '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/ci-op-cc5g2rw8-55267-q66k7-rg/providers/Microsoft.Compute/virtualMachines/ci-op-cc5g2rw8-55267-q66k7-worker-southcentralus1-mmr2h' is not as expected. Beware that the Plan object's properties are case-sensitive. Learn more about common virtual machine error codes.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always on 4.14 for bootstrap/masters
Always on 4.11+ for workers

Steps to Reproduce:

1. Config osImage for all nodes in install-config, set publisher to RedHat 
2. install cluster.
3.

Actual results:

Bootstrap instance is provisioned failed.

Expected results:

installation is successful.

Additional info:

Installation is successful when setting publisher to "redhat"

https://github.com/openshift/installer/pull/7426

Bug OCPBUGS-18278: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-autoscaler-operator/pull/281

Bug OCPBUGS-4147: More than one cluster can be created in openshift-cluster-api

View the Description View the linked PRs

Description of problem:

More than one cluster can be created in openshift-cluster-api

$ oc get cluster                                                             
NAME                          PHASE          AGE   VERSION
ci-ln-kv1gj4b-72292-jn4rw     Provisioning   19m
ci-ln-kv1gj4b-72292-jn4rw-1   Provisioning   7s

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-11-25-204445

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

More than one cluster can be created in openshift-cluster-api
$ oc get cluster                                                             NAME                          PHASE          AGE   VERSION ci-ln-kv1gj4b-72292-jn4rw     Provisioning   19m ci-ln-kv1gj4b-72292-jn4rw-1   Provisioning   7s

Expected results:

The cluster-api namespace to be only the cluster you're running on, and allow users to use cluster API for creating other clusters only in other namespaces

Additional info:

Related to https://issues.redhat.com/browse/OCPBUGS-1493

https://github.com/openshift/cluster-capi-operator/pull/106

Task HOSTEDCP-1102: Remove Release Version Check When Reconciling ImageContentSources

View the Description View the linked PRs

Removes the version check on reconciling the image content type policy since that is not needed in release image versions greater than 4.13.

https://github.com/openshift/hypershift/pull/2847

Bug OCPBUGS-19375: [4.14] Multus security hardening: per node certification

View the Description View the linked PRs

https://github.com/openshift/multus-cni/pull/184

https://github.com/openshift/multus-cni/pull/185

Bug OCPBUGS-23399: Power VS: Zones that have transitioned to PER will fail to deploy

View the Description View the linked PRs

Description of problem:

Power edge router (PER) is a technology that we are using in place of cloud connections. As zones transition to this new technology, any workspace created there will only support PER, while old workspaces will support cloud connections. We need to check which technology should be used.

Version-Release number of selected component (if applicable):

How reproducible:

Easily.

Steps to Reproduce:

1. Create a new Power Virtual Server Workspace in wdc06
2. Deploy OCP 4.14 to that workspace with the Installer
3. Deploy will fail

Actual results:

A cluster will fail to deploy as it tries to provision a cloud connection

Expected results:

A cluster will be created with PER if it's a new workspace. If it's an old workspace, continue to use cloud connections.

Additional info:

Fix in review.

https://github.com/openshift/installer/pull/7736

Bug OCPBUGS-7282: node_exporter shouldn't collect metrics for Calico Virtual NICs

View the Description View the linked PRs

Description of problem:

- Calico Virtual NICs should be excluded from node_exporter collector.
- All NICs beginning with cali* should be added to collector.netclass.ignored-devices to ensure that metrics are not collected.
- node_exporter is meant to collect metrics for physical interfaces only.

Version-Release number of selected component (if applicable):

OpenShift 4.12

How reproducible:

Always

Steps to Reproduce:

Run an OpenShift cluster using Calico SDN.
Observe -> Metrics -> Run the following PromQL query: "group by(device) (node_network_info)"
Observe that Calico Virtual NICs present.

Actual results:

Calico Virtual NICs present in OCP Metrics.

Expected results:

Only physical network interfaces should be present.

Additional info:

Similar to this bug, but for Calico virtual NICs: https://issues.redhat.com/browse/OCPBUGS-1321

https://github.com/openshift/cluster-monitoring-operator/pull/1905

Bug OCPBUGS-10924: Openshift operators should be compliant with CIS benchmark rule

View the Description View the linked PRs

Description of problem:

Machine-config operator is  not compliant with CIS benchmark rule "Ensure Usage of Unique Service Accounts" [1] as part of "ocp4-cis" profile used in compliance operator [2]. Observed that machine-config operator is using the default service account where default SA comes into play if there is no other service account specified. OpenShift core  operators should be compliant with the CIS benchmark, i.e. the operators should run with their own serviceaccount rather than using the "default" one.


[1] https://static.open-scap.org/ssg-guides/ssg-ocp4-guide-cis.html#xccdf_org.ssgproject.content_group_accounts
[2] https://docs.openshift.com/container-platform/4.11/security/compliance_operator/compliance-operator-supported-profiles.html

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Core operators are using default service account

Expected results:

Core operators should run with their own service account

Additional info:

https://github.com/openshift/machine-config-operator/pull/3740

Bug OCPBUGS-14033: Pathological test failing on reason/RecreatingFailedPod in openshift-monitoring

View the Description View the linked PRs

Description of problem:

We have presubmit and periodic jobs failing on

: [sig-arch] events should not repeat pathologically for namespace openshift-monitoring
{  2 events happened too frequently

event happened 21 times, something is wrong: ns/openshift-monitoring statefulset/prometheus-k8s hmsg/6f9bc9e1d7 - pathological/true reason/RecreatingFailedPod StatefulSet openshift-monitoring/prometheus-k8s is recreating failed Pod prometheus-k8s-1 From: 16:11:36Z To: 16:11:37Z result=reject 
event happened 22 times, something is wrong: ns/openshift-monitoring statefulset/prometheus-k8s hmsg/ecfdd1d225 - pathological/true reason/SuccessfulDelete delete Pod prometheus-k8s-1 in StatefulSet prometheus-k8s successful From: 16:11:36Z To: 16:11:37Z result=reject }

The failure occurs when the event happens over 20 times.

The RecreatingFailedPod reason shows up in 4.14 and Presubmits and does not show up in 4.13.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Run presubmits or periodics; here are latest examples:

 2023-05-24 06:25:52.551883+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1661210557367193600                                                                                        | {aws,amd64,sdn,ha,serial}
 2023-05-24 10:20:54.91883+00  | https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-gcp-sdn-serial/1661267817128792064                                                                                   | {gcp,amd64,sdn,ha,serial}
 2023-05-24 14:17:18.849402+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/27899/pull-ci-openshift-origin-master-e2e-gcp-ovn-upgrade/1661321663389634560                                                                                      | {gcp,amd64,ovn,upgrade,upgrade-micro,ha}
 2023-05-24 14:17:51.908405+00 | https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_kubernetes/1583/pull-ci-openshift-kubernetes-master-e2e-azure-ovn-upgrade/1661324100011823104                                                            | {azure,amd64,ovn,upgrade,upgrade-micro,ha}

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

That event/reason should not show up as a failure in the pathological test

Additional info:

This table shows what variants on 4.14 and Presubmits:

                     variants                     | test_count 
--------------------------------------------------+------------
 {aws,amd64,ovn,upgrade,upgrade-micro,ha}         |         63
 {gcp,amd64,ovn,upgrade,upgrade-micro,ha}         |         14
 {gcp,amd64,sdn,ha,serial,techpreview}            |         12
 {azure,amd64,sdn,ha,serial,techpreview}          |          7
 {aws,amd64,sdn,upgrade,upgrade-micro,ha}         |          6
 {aws,amd64,ovn,ha}                               |          6
 {vsphere-ipi,amd64,ovn,upgrade,upgrade-micro,ha} |          5
 {aws,amd64,sdn,ha,serial}                        |          5
 {azure,amd64,ovn,upgrade,upgrade-micro,ha}       |          5
 {metal-ipi,amd64,ovn,upgrade,upgrade-micro,ha}   |          5
 {vsphere-ipi,amd64,ovn,ha,serial}                |          4
 {gcp,amd64,sdn,ha,serial}                        |          3
 {aws,amd64,ovn,single-node}                      |          3
 {metal-ipi,amd64,ovn,ha,serial}                  |          2
 {aws,amd64,ovn,ha,serial}                        |          2
 {aws,amd64,upgrade,upgrade-micro,ha}             |          1
 {aws,arm64,sdn,ha,serial}                        |          1
 {aws,arm64,ovn,ha,serial,techpreview}            |          1
 {vsphere-ipi,amd64,ovn,ha,serial,techpreview}    |          1
 {aws,amd64,sdn,ha,serial,techpreview}            |          1
 {libvirt,ppc64le,ovn,ha,serial}                  |          1
 {amd64,upgrade,upgrade-micro,ha}                 |          1

Just for my record, I'm using this query to check 4.14 and Presubmits:

SELECT
    rt.created_at, url, variants
FROM
    prow_jobs pj
    JOIN prow_job_runs r ON r.prow_job_id = pj.id
    JOIN prow_job_run_tests rt ON rt.prow_job_run_id = r.id
    JOIN prow_job_run_test_outputs o ON o.prow_job_run_test_id = rt.id
    JOIN tests ON rt.test_id = tests.id
WHERE
    pj.release IN ('4.14', 'Presubmits')
    AND rt.status = 12
    AND tests.id = 65991
    AND o.output LIKE '%RecreatingFailedPod%'
ORDER BY rt.created_at, variants DESC;

And this query for checking 4.13:

SELECT
    rt.created_at, url, variants
FROM
    prow_jobs pj
    JOIN prow_job_runs r ON r.prow_job_id = pj.id
    JOIN prow_job_run_tests rt ON rt.prow_job_run_id = r.id
    JOIN prow_job_run_test_outputs o ON o.prow_job_run_test_id = rt.id
    JOIN tests ON rt.test_id = tests.id
WHERE
    pj.release IN ('4.13')
    AND rt.status = 12
    AND tests.id IN (65991, 244,245)
    AND o.output LIKE '%RecreatingFailedPod%'
ORDER BY rt.created_at, variants DESC;

This shows jobs beginning on 4/13 to today.

Bug OCPBUGS-14354: E2e tests: Pipelines tests should be enabled again

View the Description View the linked PRs

As part of issue - https://issues.redhat.com/browse/OCPBUGS-14352 pipeline e2e tests are disabled. Enable pipeline e2e tests again.

https://github.com/openshift/console/pull/12911

Bug OCPBUGS-14917: PowerVS: Cleanup service instances for destroy cluster

View the Description View the linked PRs

Description of problem:

The PowerVS installer will have code which creates a new service instance during installation.  Therefore, we need to delete that service instance upon cluster deletion.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Create cluster
2. Delete cluster

Actual results:

No leftover service instance

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7173

Task MGMT-14793: Assisted discovery core and root user shell should have proxy environment set

View the Description View the linked PRs

Background

When we run our agent we set the proxy environment variables as can be seen here

When the user SSHs into the host, the shell does not have those environment variables set.

Issue

This means that when the user is trying to debug network connectivity (for example, in day-2 users often SSH to see why they can't reach the day-1 cluster's API), they will usually try to run curl to see whether they can reach the URL themselves, but it might behave differently than the agent because the shell, by default, doesn't use the proxy settings.

Solution

Set the default environment variables (through .profile) of the core and root shells to include the same proxy environment variables as the agent, so that when the user logs into the host to run commands, they would have the same proxy settings as the ones the agent has.

Example

One example where we ran into this issue is when a customer forgot to set the correct noProxy settings in the UI during day-2, and so the agent was complaining about not being able to reach the day-1 API server (as the API server is unreachable through the proxy), but when we SSHd into the host and tried to curl, everything seemed to be working fine. Only after we ran tcpdump to see the difference in requests that we noticed the agent was routing requests through the proxy but curl wasn't, because the shell didn't have the proxy settings by default. If the shell had the correct proxy settings, it would've been easier to troubleshoot the problem.

https://github.com/openshift/assisted-service/pull/5373

Bug OCPBUGS-14466: The statefulset thanos-ruler-user-workload lacks serviceName

View the Description View the linked PRs

Description of problem:

The statefulset thanos-ruler-user-workload no serviceName. As the document described, the serviceName is a must for Statefulset. I'm not sure if we need service here, but one question, if we don't need service, why not use a regular Deployment? Thanks!

MacBook-Pro:k8sgpt jianzhang$ oc explain statefulset.spec.serviceName 
KIND:     StatefulSet
VERSION:  apps/v1FIELD:    serviceName <string>DESCRIPTION:
     serviceName is the name of the service that governs this StatefulSet. This
     service must exist before the StatefulSet, and is responsible for the
     network identity of the set. Pods get DNS/hostnames that follow the
     pattern: pod-specific-string.serviceName.default.svc.cluster.local where
     "pod-specific-string" is managed by the StatefulSet controller.

MacBook-Pro:k8sgpt jianzhang$ oc get statefulset -n openshift-user-workload-monitoring -o=jsonpath={.spec.serviceName}
MacBook-Pro:k8sgpt jianzhang$ 

MacBook-Pro:k8sgpt jianzhang$ oc get statefulset -n openshift-user-workload-monitoring
NAME                         READY   AGE
prometheus-user-workload     2/2     4h44m
thanos-ruler-user-workload   2/2     4h44m

MacBook-Pro:k8sgpt jianzhang$ oc get svc -n openshift-user-workload-monitoring
NAME                                      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                       AGE
prometheus-operated                       ClusterIP   None            <none>        9090/TCP,10901/TCP            4h44m
prometheus-operator                       ClusterIP   None            <none>        8443/TCP                      4h44m
prometheus-user-workload                  ClusterIP   172.30.46.204   <none>        9091/TCP,9092/TCP,10902/TCP   4h44m
prometheus-user-workload-thanos-sidecar   ClusterIP   None            <none>        10902/TCP                     4h44m
thanos-ruler                              ClusterIP   172.30.110.49   <none>        9091/TCP,9092/TCP,10901/TCP   4h44m
thanos-ruler-operated                     ClusterIP   None            <none>        10902/TCP,10901/TCP           4h44m

Version-Release number of selected component (if applicable):

MacBook-Pro:k8sgpt jianzhang$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-05-31-080250   True        False         7h30m   Cluster version is 4.14.0-0.nightly-2023-05-31-080250

How reproducible:

always

Steps to Reproduce:

1. Install OCP 4.14 cluster.
2. Check cluster's statefulset instances or run `k8sgpt analyze -d`
3.

Actual results:

MacBook-Pro:k8sgpt jianzhang$ k8sgpt analyze -d
Service nfs-provisioner/example.com-nfs does not exist
AI Provider: openai


0 openshift-user-workload-monitoring/thanos-ruler-user-workload(thanos-ruler-user-workload)
- Error: StatefulSet uses the service openshift-user-workload-monitoring/ which does not exist.
  Kubernetes Doc: serviceName is the name of the service that governs this StatefulSet. This service must exist before the StatefulSet, and is responsible for the network identity of the set. Pods get DNS/hostnames that follow the pattern: pod-specific-string.serviceName.default.svc.cluster.local where "pod-specific-string" is managed by the StatefulSet controller.

Expected results:

There is the serviceName for statefulset.

Additional info:

https://github.com/openshift/prometheus-operator/pull/236

Bug OCPBUGS-14934: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/682

Bug OCPBUGS-10126: Update 4.14 ose-cluster-update-keys image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-update-keys/pull/48

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-update-keys/pull/48

Bug OCPBUGS-10843: oc debug fails with error "container "container-00" in pod "xiyuan24-f3-h4264-master-0-debug" is waiting to start: ContainerCreating"

View the Description View the linked PRs

Description of problem:

Oc debug fails with error "container "container-00" in pod "xiyuan24-f3-h4264-master-0-debug" is waiting to start: ContainerCreating"I see that above error happens when run via automation and running it locally does not have this issue, also when increased time around the command in the automation script it works fine with out any issues.

Version-Release number of selected component (if applicable):

03-24 17:57:54.649        [12:27:48] INFO> Shell Commands: oc version -o yaml --client --kubeconfig=/tmp/kubeconfig20230324-374-gt1vvm
03-24 17:57:54.649        clientVersion:
03-24 17:57:54.649          buildDate: "2023-03-17T23:32:35Z"
03-24 17:57:54.649          compiler: gc
03-24 17:57:54.649          gitCommit: eed143055ede731029931ad204b19cd2f565ef1a
03-24 17:57:54.649          gitTreeState: clean
03-24 17:57:54.649          gitVersion: 4.13.0-202303172327.p0.geed1430.assembly.stream-eed1430
03-24 17:57:54.649          goVersion: go1.19.4
03-24 17:57:54.649          major: ""
03-24 17:57:54.649          minor: ""
03-24 17:57:54.649          platform: linux/amd64
03-24 17:57:54.649        kustomizeVersion: v4.5.7
03-24 17:57:54.649        [12:27:49] INFO> Exit Status: 0

How reproducible:

Always

Steps to Reproduce:

1.Install latest 4.13 cluster
2. Run script https://github.com/openshift/verification-tests/blob/master/features/upgrade/security_compliance/fips.feature#L66

Actual results:

Test fails with error mentioned in the description

Expected results:

Test should not fail

Additional info:

Adding a link to the conversation which i had with maciej about this issue https://redhat-internal.slack.com/archives/GK58XC2G2/p1679655589922729

Run log with --loglevel=9 -> https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Runner/770180/console

https://github.com/openshift/oc/pull/1393

Bug OCPBUGS-12729: Dual stack VIPs incompatible with EnableUnicast setting

View the Description View the linked PRs

Description of problem:

This came out of the investigation of https://issues.redhat.com/browse/OCPBUGS-11691 . The nested node configs used to support dual stack VIPs do not correctly respect the EnableUnicast setting. This is causing issues on EUS upgrades where the unicast migration cannot happen until all nodes are on 4.12. This is blocking both the workaround and the eventual proper fix.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Deploy 4.11 with unicast explicitly disabled (via MCO patch)
2. Write /etc/keepalived/monitor-user.conf to suppress unicast migration
3. Upgrade to 4.12

Actual results:

Nodes come up in unicast mode

Expected results:

Nodes remain in multicast mode until monitor-user.conf is removed

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/237

Bug OCPBUGS-18450: AWS Missing Base Permission

View the Description View the linked PRs

Description of problem:

During installation:

level=error msg=Error: reading Security Group (sg-0f07c871bdbd6379f) Rules: UnauthorizedOperation: You are not authorized to perform this operation.
level=error msg=	status code: 403, request id: f3e18ac0-f2fc-471f-8055-7194112c8225 

Users are unable to create the security groups for the bootstrap node

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Warning/Error should come up when the permission does not exist.

Additional info:

https://github.com/openshift/installer/pull/7460

Bug OCPBUGS-7620: Edit Deployment (and DC) form doesn't enable Save button when changing strategy type

View the Description View the linked PRs

Description of problem:
When the user edits a deployment and switches (just) the rollout "Strategy type" the form couldn't be saved because the Save button stays disabled.

Version-Release number of selected component (if applicable):
4.13

How reproducible:
Always

Steps to Reproduce:

Import an application from git
Select action "Edit Deployment"
Change the "Strategy type" value

Actual results:
Save button stays disabled

Expected results:
Save button should enable when changing a value (that doesn't make the form state invalid)

Additional info:

https://github.com/openshift/console/pull/12608

Bug OCPBUGS-14077: MULTIARCH-3492: Avoid conflicting subnets

View the Description View the linked PRs

Description of problem:

Currently PowerVS uses a DefaultMachineCIDR: 192.168.0.0/24
This will create network conflicts if another cluster is created in the same zone.

Version-Release number of selected component (if applicable):

current master branch

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

The fix is to use a random number for DefaultMachineCIDR: 192.168.%d.0/24 This should significantly reduce the chances for collisions.

https://github.com/openshift/installer/pull/7145

Task AUTH-373: make oauth-proxy send audit ID with its requests

View the Description View the linked PRs

OAuth-Proxy should send an Audit-Id header with its requests to the kube-apiserver so that we can easily track its requests and be able to tell which arrived and which were processed.

This comes from a time when the CI was in disarray and oauth-proxy requests were failing to reach the KAS but we did not know if at least any were processed or if they were just all plainly rejected somewhere in the middle.

https://github.com/openshift/oauth-proxy/pull/252

Bug OCPBUGS-10051: Catalogs should not be included in the ImageContentSourcePolicy.yaml

View the Description View the linked PRs

Description of problem:

Currently when the oc-mirror command runs the generated ImageContentSourcePolicy.yaml should not include mirrors for the mirrored operator catalogs

This should be the case for registry located catalogs and oci fbc catalogs (located on disk)
Jennifer Power, Alex Flom can you help us confirm this is the expected behavior?

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1.Run the oc mirror command mirroring the catalog
/bin/oc-mirror --config imageSetConfig.yaml  docker://localhost:5000  --use-oci-feature  --dest-use-http  --dest-skip-tls
with imagesetconfig:
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /tmp/storageBackend
mirror:
  operators:
  - catalog: oci:///home/user/catalogs/rhop4.12
    # copied from registry.redhat.io/redhat/redhat-operator-index:v4.12
    targetCatalog: "mno/redhat-operator-index"
    targetVersion: "v4.12"
    packages:
    - name: aws-load-balancer-operator

Actual results:

Catalog is included in the imageContentSourcePolicy.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: localhost:5000/mno/redhat-operator-index:v4.12
  sourceType: grpc

---
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  labels:
    operators.openshift.org/catalog: "true"
  name: operator-0
spec:
  repositoryDigestMirrors:
  - mirrors:
    - localhost:5000/albo
    source: registry.redhat.io/albo
  - mirrors:
    - localhost:5000/mno
    source: mno
  - mirrors:
    - localhost:5000/openshift4
    source: registry.redhat.io/openshift4

Expected results:

No catalog should be included in the imageContentSourcePolicy.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: redhat-operator-index
  namespace: openshift-marketplace
spec:
  image: localhost:5000/mno/redhat-operator-index:v4.12
  sourceType: grpc

---
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  labels:
    operators.openshift.org/catalog: "true"
  name: operator-0
spec:
  repositoryDigestMirrors:
  - mirrors:
    - localhost:5000/albo
    source: registry.redhat.io/albo
  - mirrors:
    - localhost:5000/openshift4
    source: registry.redhat.io/openshift4

Additional info:

https://github.com/openshift/oc-mirror/pull/586

Bug OCPBUGS-14399: Unable to set protectKernelDefaults from "true" to "false" in kubelet.conf

View the Description View the linked PRs

Description of problem:

Unable to set protectKernelDefaults from "true" to "false" in kubelet.conf on the nodes in RHOCP4.13 although this was possible in RHOCP4.12.

Version-Release number of selected component (if applicable):

   Red Hat OpenShift Container Platform Version Number: 4
   Release Number: 13
   Kubernetes Version: v1.26.3+b404935
   Docker Version: N/A
   Related Package Version: 
	   - cri-o-1.26.3-3.rhaos4.13.git641290e.el9.x86_64
   Related Middleware/Application: none
   Underlying RHEL Release Number: Red Hat Enterprise Linux CoreOS release 4.13
   Underlying RHEL Architecture: x86_64
   Underlying RHEL Kernel Version: 5.14.0-284.13.1.el9_2.x86_64
   
Drivers or hardware or architecture dependency: none

How reproducible:


 always

Steps to Reproduce:

    1. Deploy OCP cluster using RHCOS
    2. Set protectKernelDefaults as true using the document [1]

Actual results:

protectKernelDefaults can't be set.

Expected results:

 protectKernelDefaults can be set.

Additional info:



protectKernelDefaults in NOT set in kubelet.conf

    ---
    # oc debug node/ocp4-worker1

    # chroot /host

    # cat /etc/kubernetes/kubelet.conf
      ...
      "protectKernelDefaults": true, <- NOT modified. Moreover, the format is changed to json.
      ...
    ---

Also    "protectKernelDefaults: false" does not seem to be set into the machineConfig created by kubeletConfig Kind. See below:

    ---
    # oc get mc 99-worker-generated-kubelet -o yaml
    ...
    storage:
      files:
      - contents:
          compression: "" 
          source: data:text/plain;charset=utf-8;base64, [The contents of kubelet.conf encoded with base64]
        mode: 420
        overwrite: true
        path: /etc/kubernetes/kubelet.conf

    // Write [The contents of kubelet.conf encoded with base64] to the file.
    # vim kubelet.conf 

    // Decode [The contents of kubelet.conf encoded with base64]
    # cat kubelet.conf | base64 -d
    ...
    "protectKernelDefaults": true, <- "protectKernelDefaults: false" is not set.
    ----



[1] https://access.redhat.com/solutions/6974438

https://github.com/openshift/machine-config-operator/pull/3736

Bug OCPBUGS-16586: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-openstack/pull/74

Bug OCPBUGS-21541: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-dns-operator/pull/389

Bug OCPBUGS-24638: Tuned Profiles going degraded due to the extra net.core.rps_default_mask configuration in openshift-node-performance-xxx-profile

View the Description View the linked PRs

Description of problem:
Issue - Profiles are degraded [1]even after applied due to below [2]error:

[1]

$oc get profile -A
NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   worker0    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker1    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker10   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker11   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker12   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker13   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker14   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker15   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker2    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker3    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker4  rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker5    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker6    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker7    rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker8   rdpmc-patch-worker   True      True       5d
openshift-cluster-node-tuning-operator   worker9   rdpmc-patch-worker   True      True       5d

[2]

  lastTransitionTime: "2023-12-05T22:43:12Z"
    message: TuneD daemon issued one or more sysctl override message(s) during profile
      application. Use reapply_sysctl=true or remove conflicting sysctl net.core.rps_default_mask
    reason: TunedSysctlOverride
    status: "True"

If we see in rdpmc-patch-master tuned:

NAMESPACE                                NAME                                          TUNED                APPLIED   DEGRADED   AGE
openshift-cluster-node-tuning-operator   master0    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master1    rdpmc-patch-master   True      True       5d
openshift-cluster-node-tuning-operator   master2    rdpmc-patch-master   True      True       5d

We are configuring below in rdpmc-patch-master tuned:

$ oc get tuned rdpmc-patch-master -n openshift-cluster-node-tuning-operator -oyaml |less
spec:
  profile:
  - data: |
      [main]
      include=performance-patch-master
      [sysfs]
      /sys/devices/cpu/rdpmc = 2
    name: rdpmc-patch-master
  recommend:

Below in Performance-patch-master which is included in above tuned:

spec:
  profile:
  - data: |
      [main]
      summary=Custom tuned profile to adjust performance
      include=openshift-node-performance-master-profile
      [bootloader]
      cmdline_removeKernelArgs=-nohz_full=${isolated_cores}

Below(which is coming in error) is in openshift-node-performance-master-profile included in above tuned:

net.core.rps_default_mask=${not_isolated_cpumask}

RHEL BUg has been raised for the same https://issues.redhat.com/browse/RHEL-18972

    Version-Release number of selected component (if applicable):{code:none}
4.14

https://github.com/openshift/cluster-node-tuning-operator/pull/880

Bug OCPBUGS-7840: HCCO overwrites kubernetes service endpoints that are managed by KAS

View the Description View the linked PRs

Description of problem:

The kube apiserver manages the endpoints resource of the default/kubernetes service so that pods can access the kube apiserver. It does this via the --advertise-address flag and the container port for the kube apiserver pod. Currently the HCCO overwrites the endpoints resource with another port. This conflicts with what the KAS manages, it should not do that.

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create an AWS publicAndPrivate cluster with DNS hostnames and a Route publishing strategy for the apiserver.

Actual results:

The HCCO overwrites the default/kubernetes endpoints resource in the guest cluster.

Expected results:

The HCCO does not overwrite the default/kubernetes endpoints resource

Additional info:

https://github.com/openshift/hypershift/pull/2964

Story HOSTEDCP-801: Annotate for External DNS hosted CP components for `Private` clusters

View the Description View the linked PRs

When a HostedCluster is configured as `Private`, annotate the necessary hosted CP components (API and OAuth) so that External DNS can still create public DNS records (pointing to private IP resources).

The External DNS record should be pointing to the resource for the PrivateLink VPC Endpoint. "We need to specify the IP of the A record. We can do that with a cluster IP service."

Context: https://redhat-internal.slack.com/archives/C01C8502FMM/p1675432805760719

https://github.com/openshift/hypershift/pull/2286

Bug MGMT-14266: [Staging][BE] - BE response when trying to create P/Z cluster with OCP ver 4.10 is unclear

View the Description View the linked PRs

Description of the problem:

In staging, BE 2.18.0, using UI trying to create new cluster with P/Z cpu arch. and OCP 4.10 - getting the following response :

Non x86_64 CPU architectures for version 4.10 are supported only with User Managed Networking

How reproducible:

100%

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:
Message should be clearer for the user to understand the issue:
p/Z cpu arch. is only supported with OCP ver >= 4.12

https://github.com/openshift/assisted-service/pull/5122

Bug OCPBUGS-11225: Relax CSR check due to k8s 1.27 changes

View the Description View the linked PRs

Kubernetes 1.27 changes validation of CSR for non-RSA kubelet client/serving CSRs, see https://github.com/kubernetes/kubernetes/issues/109077 and the PR changing https://github.com/kubernetes/kubernetes/pull/111660.

For that reason our machine-config-approver needs to relax the validation in https://github.com/openshift/cluster-machine-approver/blob/d74f42bb37c4130ae1e91819d90ad40a51ec472b/pkg/controller/csr_check.go#L84-L86 such that it appropriately expects the necessary key usage.

Bug OCPBUGS-13969: Bump openshift-router to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/router vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/router/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, k8s.io/apiserver, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.

https://github.com/openshift/router/pull/486

Bug OCPBUGS-15961: ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.34' not found on 4.12-to-4.13

View the Description View the linked PRs

Description of problem:

When updating from 4.12 to 4.13, the incoming ovn-k8s-cni-overlay expects RHEL 9, and fails to run on the still-RHEL-8 4.12 nodes.

Version-Release number of selected component (if applicable):

4.13 and 4.14 ovn-k8s-cni-overlay vs. 4.12 RHCOS's RHEL 8.

How reproducible:

100%

Steps to Reproduce:

Picked up in TestGrid.

Actual results:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-upgrade-from-stable-4.12-e2e-gcp-ovn-rt-upgrade/1677232369326624768/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/nodes/ci-op-y7r1x9z3-3a480-9swt7-master-2/journal | zgrep dns-operator | tail -n1
Jul 07 12:34:30.202100 ci-op-y7r1x9z3-3a480-9swt7-master-2 kubenswrapper[2168]: E0707 12:34:30.201720    2168 pod_workers.go:965] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"dns-operator-78cbdc89fd-kckcd_openshift-dns-operator(5c97a52b-f774-40ae-8c17-a17b30812596)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"dns-operator-78cbdc89fd-kckcd_openshift-dns-operator(5c97a52b-f774-40ae-8c17-a17b30812596)\\\": rpc error: code = Unknown desc = failed to create pod network sandbox k8s_dns-operator-78cbdc89fd-kckcd_openshift-dns-operator_5c97a52b-f774-40ae-8c17-a17b30812596_0(1fa1dd2b35100b0f1ec058d79042a316b909e38711fcadbf87bd9a1e4b62e0d3): error adding pod openshift-dns-operator_dns-operator-78cbdc89fd-kckcd to CNI network \\\"multus-cni-network\\\": plugin type=\\\"multus\\\" name=\\\"multus-cni-network\\\" failed (add): [openshift-dns-operator/dns-operator-78cbdc89fd-kckcd/5c97a52b-f774-40ae-8c17-a17b30812596:ovn-kubernetes]: error adding container to network \\\"ovn-kubernetes\\\": netplugin failed: \\\"/var/lib/cni/bin/ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /var/lib/cni/bin/ovn-k8s-cni-overlay)\\\\n/var/lib/cni/bin/ovn-k8s-cni-overlay: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /var/lib/cni/bin/ovn-k8s-cni-overlay)\\\\n\\\"\"" pod="openshift-dns-operator/dns-operator-78cbdc89fd-kckcd" podUID=5c97a52b-f774-40ae-8c17-a17b30812596

Expected results:

Successful update.

Additional info:

Both 4.14 and 4.13 control planes can be associated with 4.12 compute nodes, because of EUS-to-EUS updates.

https://github.com/openshift/cluster-network-operator/pull/1901

Bug OCPBUGS-20752: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver/pull/45

Task MGMT-14462: Allow to deploy assisted-service with all available images

View the Description View the linked PRs

Since the change we did on https://github.com/openshift/assisted-test-infra/pull/1989, whenever deploying assisted installer services using "make run" or "make deploy_assisted_service" we are deploying with only single image - the default one (e.g. OPENSHIFT_VERSION=4.13).

https://github.com/openshift/assisted-service/pull/5167

Bug OCPBUGS-14798: Need a little more informative Readme for Builder

View the Description View the linked PRs

Description of problem:

The readme.md of builder is just a one liner overview of project. It would be helpful to have some additional details added for new contributors/visitors of the project.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/346

Bug OCPBUGS-20857: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-aws/pull/481

Bug OCPBUGS-3495: Dynamic plugin requests stale files after upgrade

View the Description View the linked PRs

Description of problem:
After upgrading a plugin image the browser continues to request old plugin files

How reproducible:
100%

Steps to Reproduce:
1. Build and deploy a plugin generated from console-plugin-template repo
2. open one of the plugin pages in the browser
4. Make a change in the code of that page, rebuild and deploy a new image
5. Try to view this page in firefox - you'll get a 404 error. In chrome you'll get the old page

The root cause is
The plugin js file names are auto generated, so the new image has different js file names.
But the plugin-entry.js filename remains the same, the file is cached by default and continues to request the old files

https://github.com/openshift/console/pull/13035

Bug OCPBUGS-12610: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/475

Bug OCPBUGS-15012: oc image extract incorrect idms mapping

View the Description View the linked PRs

Description of problem:

Newly introduced `--idms-file` in oc image extract is incorrectly mapped to ICSPFile object instead IDMSFile

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1464

Bug OCPBUGS-14461: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/261

Bug OCPBUGS-16073: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/81

Bug OCPBUGS-16348: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/756

Bug OCPBUGS-12726: Nutanix: MAPI machine-controller fails to handle the windows-user-data

View the Description View the linked PRs

Description of problem:

When running the nutanix-e2e-windows test from the WMCO PR https://github.com/openshift/windows-machine-config-operator/pull/1398, the MAPI nutanix-controller failed to create the Windows machine VM with the below error logs. It failed to marshal the windows-user-data to struct IgnitionConfig, since the windows-user-data is in powershell script format, but not the ignition data format.

I0424 17:37:43.472054       1 recorder.go:103] events "msg"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt: reconciler failed to Create machine: failed to get user data: Failed to unmarshal userData to IgnitionConfig. invalid character '<' looking for beginning of value" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt","uid":"d3981cb0-4f98-4424-9252-b100521c2a93","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"31045"} "reason"="FailedCreate" "type"="Warning"
E0424 17:37:43.472923       1 controller.go:329]  "msg"="Reconciler error" "error"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt: reconciler failed to Create machine: failed to get user data: Failed to unmarshal userData to IgnitionConfig. invalid character '<' looking for beginning of value" "controller"="machine-controller" "name"="ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt" "namespace"="openshift-machine-api" "object"={"name":"ci-op-zhi8pd1k-5c595-dnpj5-e2e-wm-f84vt","namespace":"openshift-machine-api"} "reconcileID"="16572b5d-2418-4f7c-b7a8-5f08f2659391"

Version-Release number of selected component (if applicable):

How reproducible:

When the Machine is configured to be Windows node

Steps to Reproduce:

Run the ci/prow/nutanix-e2e-operator test.

Actual results:

The MAPI nutanix-controller failed to create the Windows VM with the error logs showing above.

Expected results:

The Windows VM and node can be successfully created and provisioned.

Additional info:

https://github.com/openshift/machine-api-provider-nutanix/pull/48

Bug OCPBUGS-8092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-22651: Ccoctl create Azure Workload Identity resource does not work properly in eastus region because the storage account does not allow Public access.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22369~~. The following is the description of the original issue:
—
Description of problem:

Default security settings for new Azure Storage accounts be updated. Using ccoctl to create Azure Workload Identity resources in region eastus is not work.

I found several commonly used regions and did the test. The test results are as follows.

List of regions not working properly: eastus

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc False


 List of regions working properly: westus, australiacentral, australiaeast, centralus, australiasoutheast, southindia…

$ az storage account list -g mihuangdispri0929-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangdispri0929rgoidc	True

Version-Release number of selected component (if applicable):

4.14/4.15

How reproducible:

Always

Steps to Reproduce:

1.Running ccoctl azure create-all command to create azure workload identity resources in region eastus.

[huangmingxia@fedora CCO-bugs]$ ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test

Actual results:

[huangmingxia@fedora CCO-bugs]$  ./ccoctl azure create-all  --name 'mihuangp1' --region 'eastus' --subscription-id  {SUBSCRIPTION-ID} --tenant-id {TENANNT-ID} --credentials-requests-dir=./credrequests --dnszone-resource-group-name 'os4-common' --storage-account-name='mihuangp1oidc' --output-dir test
2023/10/25 11:14:36 Using existing RSA keypair found at test/serviceaccount-signer.private
2023/10/25 11:14:36 Copying signing key for use by installer
2023/10/25 11:14:36 No --oidc-resource-group-name provided, defaulting OIDC resource group name to mihuangp1-oidc
2023/10/25 11:14:36 No --installation-resource-group-name provided, defaulting installation resource group name to mihuangp1
2023/10/25 11:14:36 No --blob-container-name provided, defaulting blob container name to mihuangp1
2023/10/25 11:14:39 Created resource group /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc
2023/10/25 11:15:01 Created storage account /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc
2023/10/25 11:15:03 failed to create blob container: PUT https://management.azure.com/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/mihuangp1-oidc/providers/Microsoft.Storage/storageAccounts/mihuangp1oidc/blobServices/default/containers/mihuangp1--------------------------------------------------------------------------------RESPONSE 409: 409 ConflictERROR CODE: PublicAccessNotPermitted--------------------------------------------------------------------------------{  "error": {    "code": "PublicAccessNotPermitted",    "message": "Public access is not permitted on this storage account.\nRequestId:415c51f1-c01e-0017-7ef1-06ec0c000000\nTime:
2023-10-25T03:15:02.7928767Z"  }}--------------------------------------------------------------------------------

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsvmihuangtt0947rgoidc False

Expected results:

Resources created successfully.

$ az storage account list -g mihuangtt0947-rg-oidc --query "[].[name,allowBlobPublicAccess]" -o tsv
mihuangtt0947rgoidc True

Additional info:

Google email: Important notice: Default security settings for new Azure Storage accounts will be updated

https://github.com/openshift/cloud-credential-operator/pull/612

Story HOSTEDCP-1008: Review/Extend NodePool metrics

View the Description View the linked PRs

Follow up for https://issues.redhat.com/browse/HOSTEDCP-975

Review and move remaining metrics when possible from loop into nodepool controller

Explore and discuss granular metrics to track NodePool lifecycle bottle necks, infra, ignition, node networking, available. Consolidate that with hostedClusterTransitionSeconds metrics and dashboard panels

Explore and discuss metrics for upgrade duration SLO for NodePool.

Extend nodepools metrics grafana panel to visualize any new metrics:
https://hypershift-monitoring.homelab.sjennings.me:3000/d/PGCTmCL4z/hypershift-slos-slis-alberto-playground?orgId=1&from=now-24h&to=now
https://github.com/openshift/hypershift/tree/main/contrib/metrics

https://github.com/openshift/hypershift/pull/2631

Bug OCPBUGS-11143: [Azure] Replace master failed as new master did not add into lb backend

View the Description View the linked PRs

Description of problem:

On azure, delete a master, old machine stuck in Deleting, some pods in cluster are in ImagePullBackOff, check from azure console, new master did not add into lb backend, seems this lead the machine has no internet connection.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-12-024338

How reproducible:

Always

Steps to Reproduce:

1. Set up a cluster on Azure, networkType ovn
2. Delete a master
3. Check master and pod

Actual results:

Old machine stuck in Deleting,  some pods are in ImagePullBackOff.
 $ oc get machine    
NAME                                    PHASE      TYPE              REGION   ZONE   AGE
zhsunaz2132-5ctmh-master-0              Deleting   Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-1              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-2              Running    Standard_D8s_v3   westus          160m
zhsunaz2132-5ctmh-master-flqqr-0        Running    Standard_D8s_v3   westus          105m
zhsunaz2132-5ctmh-worker-westus-dhwfz   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-dw895   Running    Standard_D4s_v3   westus          152m
zhsunaz2132-5ctmh-worker-westus-xlsgm   Running    Standard_D4s_v3   westus          152m

$ oc describe machine zhsunaz2132-5ctmh-master-flqqr-0  -n openshift-machine-api |grep -i "Load Balancer"
      Internal Load Balancer:  zhsunaz2132-5ctmh-internal
      Public Load Balancer:      zhsunaz2132-5ctmh

$ oc get node            
NAME                                    STATUS     ROLES                  AGE    VERSION
zhsunaz2132-5ctmh-master-0              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-1              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-2              Ready      control-plane,master   165m   v1.26.0+149fe52
zhsunaz2132-5ctmh-master-flqqr-0        NotReady   control-plane,master   109m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dhwfz   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-dw895   Ready      worker                 152m   v1.26.0+149fe52
zhsunaz2132-5ctmh-worker-westus-xlsgm   Ready      worker                 152m   v1.26.0+149fe52
$ oc describe node zhsunaz2132-5ctmh-master-flqqr-0
  Warning  ErrorReconcilingNode       3m5s (x181 over 108m)  controlplane         [k8s.ovn.org/node-chassis-id annotation not found for node zhsunaz2132-5ctmh-master-flqqr-0, macAddress annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0" , k8s.ovn.org/l3-gateway-config annotation not found for node "zhsunaz2132-5ctmh-master-flqqr-0"]

$ oc get po --all-namespaces | grep ImagePullBackOf   
openshift-cluster-csi-drivers                      azure-disk-csi-driver-node-l8ng4                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-csi-drivers                      azure-file-csi-driver-node-99k82                                  0/3     Init:ImagePullBackOff   0              113m
openshift-cluster-node-tuning-operator             tuned-bvvh7                                                       0/1     ImagePullBackOff        0              113m
openshift-dns                                      node-resolver-2p4zq                                               0/1     ImagePullBackOff        0              113m
openshift-image-registry                           node-ca-vxv87                                                     0/1     ImagePullBackOff        0              113m
openshift-machine-config-operator                  machine-config-daemon-crt5w                                       1/2     ImagePullBackOff        0              113m
openshift-monitoring                               node-exporter-mmjsm                                               0/2     Init:ImagePullBackOff   0              113m
openshift-multus                                   multus-4cg87                                                      0/1     ImagePullBackOff        0              113m
openshift-multus                                   multus-additional-cni-plugins-mc6vx                               0/1     Init:ImagePullBackOff   0              113m
openshift-ovn-kubernetes                           ovnkube-master-qjjsv                                              0/6     ImagePullBackOff        0              113m
openshift-ovn-kubernetes                           ovnkube-node-k8w6j                                                0/6     ImagePullBackOff        0              113m

Expected results:

Replace master successful

Additional info:

Tested payload 4.13.0-0.nightly-2023-02-03-145213, same result.
Before we have tested in 4.13.0-0.nightly-2023-01-27-165107, all works well.

https://github.com/openshift/kubernetes/pull/1569

Bug OCPBUGS-12324: Update 4.14 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/230

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/230

Bug OCPBUGS-14177: Remove duplication of API calls in vSphere problem detector

View the Description View the linked PRs

As part of single run, we are basically fetching same thing over and over again and hence using API calls that should not even be made.

For example:

1. privilges check verifies permissions of datasore which is also verified by storageclass check. What is more each of those checks fetches datacenter and datastore and results in several duplication API calls.

Exit Critirea:
1. Remove duplicate checks
2. Avoid fetching same API object again and again as part of same system check.

https://github.com/openshift/vsphere-problem-detector/pull/123

Bug OCPBUGS-17290: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/203

Bug OCPBUGS-19371: Upgrade DomainMapping CRD to API version v1beta1

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18494~~. The following is the description of the original issue:
—
Description of problem:

DomainMapping CRD is still using API version v1alpha1 but v1alpha1 will be removed from the Serverless Operator version 1.33. So, upgrade the API version to v1beta1 and it is available since Serverless operator 1.21.

Additional info:

NOTE: This should be backported to 4.11 and also check min Serverless operator version supported in 4.11

slack thread: https://redhat-internal.slack.com/archives/CJYKV1YAH/p1693809331579619

https://github.com/openshift/console/pull/13165

Bug MGMT-14728: [Staging] - Events search message "\" return wrong events

View the Description View the linked PRs

Description of the problem:

Cluster events search for message=\ , or message=%5C returns all writing image to disk messages.
e.g. "Host: test-infra-cluster-f5e3a8e9-master-1, reached installation stage Writing image to disk: 5%"

How reproducible:

100%

Steps to reproduce:

1.Install cluster

2. List events with message=\ , or message=%5C

curl -s -v  --location --request GET 'https://api.stage.openshift.com/api/assisted-install/v2/events?cluster_id=2aa44b94-e533-44fe-9c0f-3b20a3d91b4e&message=%5C' --header "Authorization: Bearer $(ocm token)" | jq '.'

or

curl -s -v  --location --request GET 'https://api.stage.openshift.com/api/assisted-install/v2/events?cluster_id=2aa44b94-e533-44fe-9c0f-3b20a3d91b4e&message=\' --header "Authorization: Bearer $(ocm token)" | jq '.'

Actual results:

All "writing image to disk" are returns

Expected results:

Only events including '\' returns

https://github.com/openshift/assisted-service/pull/5252

Bug OCPBUGS-11652: Temporarily disable "[sig-cli] oc adm node-logs"

View the Description View the linked PRs

Due to enabling upstream node-logs viewer feature we have to temporarily disable this test, since the plan to switch to upstream version requires the following steps in order:
1. Modify current patches to match upstream change (being done as part of 1.27 bump)
2. Modify oc to work with both old and new API (being done in parallel with 1.27 bump, will be linked below).
3. Land k8s 1.27.
4. Modify machine-config-operator to enable enableSystemLogQuery config option (can land only after k8s 1.27, will be linked below).
5. Bring the test back.

Bug OCPBUGS-19679: SDN: 4.14 after ec4 has a higher pod ready latency compared to 4.13.10 [backport 4.14]

View the Description View the linked PRs

Description of problem:

This is to track the SDN specific issue in https://issues.redhat.com/browse/OCPBUGS-18389

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.z in node-density (lite) test

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Everytime

Steps to Reproduce:

1. Install a SDN cluster and scale up to 24 worker nodes, install 3 infra nodes and move monitoring, ingress, registry components to infra nodes. 
2. Run node-density (lite) test with 245 pod per node
3. Compare the pod ready latency to 4.13.z, and 4.14 ec4

Actual results:

4.14 nightly has a higher pod ready latency compared to 4.14 ec4 and 4.13.10

Expected results:

4.14 should have similar pod ready latency compared to previous release

Additional info:

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

With the new multus image provided by Dan Williams in https://issues.redhat.com/browse/OCPBUGS-18389, SDN 24 nodes's latency is similar to without the fix.

% oc -n openshift-network-operator get deployment.apps/network-operator -o yaml | grep MULTUS_IMAGE -A 1
        - name: MULTUS_IMAGE
          value: quay.io/dcbw/multus-cni:informer 
 % oc get pod -n openshift-multus -o yaml | grep image: | grep multus
      image: quay.io/dcbw/multus-cni:informer
....

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer	232389	314	f2c290c1-73ea-4f10-a797-3ab9d45e94b3	aws	amd64	SDN	24	245	61234	311776	https://drive.google.com/file/d/1o7JXJAd_V3Fzw81pTaLXQn1ms44lX6v5/view?usp=drive_link
4.14.0-ec.4	231559	292	087eb40c-6600-4db3-a9fd-3b959f4a434a	aws	amd64	SDN	24	245	2186	3256	https://drive.google.com/file/d/1NInCiai7WWIIVT8uL-5KKeQl9CtQN_Ck/view?usp=drive_link
4.14.0-0.nightly-2023-09-02-132842	231558	291	62404e34-672e-4168-b4cc-0bd575768aad	aws	amd64	SDN	24	245	58725	294279	https://drive.google.com/file/d/1BbVeNrWzVdogFhYihNfv-99_q8oj6eCN/view?usp=drive_link

Zenghui Shi Peng Liu request to modify the multus-daemon-config ConfigMap by removing readinessindicatorfile flag

scale down CNO deployment to 0
edit configmap to remove 80-openshift-network.conf (sdn) or 10-ovn-kubernetes.conf (ovn-k)
restart (delete) multus pod on each worker

Steps:

oc scale --replicas=0 -n openshift-network-operator deployments network-operator
oc edit cm multus-daemon-config -n openshift-multus, and remove the line "readinessindicatorfile": "/host/run/multus/cni/net.d/80-openshift-network.conf",
oc get po ~~n openshift-multus | grep multus~~ | egrep -v "multus-additional|multus-admission" | awk '{print $1}' | xargs oc delete po -n openshift-multus

Now the readinessindicatorfile flag is removed and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

Test Result: p99 is better compared to without the fix(remove readinessindicatorfile) but is stall worse than ec4, avg is still bad.

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag	232389	316	d7a754aa-4f52-49eb-80cf-907bee38a81b	aws	amd64	SDN	24	245	51775	105296	https://drive.google.com/file/d/1h-3JeZXQRO-zsgWzen6aNDQfSDqoKAs2/view?usp=drive_link

Zenghui Shi Peng Liu request to set logLever to debug in additional to removing readinessindicatorfile flag

edit the cm to set "logLevel": "verbose" -> "debug" and restart all multus pods

Now the logLever is debug and And all multus pods are restarted

% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep logLevel
        "logLevel": "debug",
% oc get cm multus-daemon-config -n openshift-multus -o yaml | grep readinessindicatorfile -c
0

OCP Version	Flexy Id	Scale Ci Job	Grafana URL	Cloud	Arch Type	Network Type	Worker Count	PODS_PER_NODE	Avg Pod Ready (ms)	P99 Pod Ready (ms)	Must-gather
4.14.0-0.nightly-2023-09-11-201102 quay.io/dcbw/multus-cni:informer and remove `readinessindicatorfile` flag and logLevel=debug	232389	320	5d1d3e6a-bfa1-4a4b-bbfc-daedc5605f7d	aws	amd64	SDN	24	245	49586	105314	https://drive.google.com/file/d/1p1PDbnqm0NlWND-komc9jbQ1PyQMeWcV/view?usp=drive_link

Edit

https://github.com/openshift/multus-cni/pull/189

Bug OCPBUGS-10148: Update 4.14 vmware-vsphere-syncer image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/61

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/63

Bug OCPBUGS-12519: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/160

Bug OCPBUGS-14575: HyperShift Should Not Check for ImageDigestMirrorSet in versions below OCP 4.13

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/hypershift/pull/2437 added the ability to override image registries with CR ImageDigestMirrorSet; however, ImageDigestMirrorSet is only valid for 4.13+.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Install HO on Mgmt Cluster 4.12

Steps to Reproduce:

1.
2.
3.

Actual results:

failed to populate image registry overrides: no matches for kind "ImageDigestMirrorSet" in version "config.openshift.io/v1"

Expected results:

No errors and HyperShift doesn't try to use ImageDigestMirrorSet prior to 4.13.

Additional info:

https://github.com/openshift/hypershift/pull/2650

Bug OCPBUGS-19002: cluster-restore.sh does not move static pods back

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18990~~. The following is the description of the original issue:
—
Description of problem:

The script refactoring from https://github.com/openshift/cluster-etcd-operator/pull/1057 introduced a regression. 

Since the static pod list variable was renamed, it is now empty and won't restore the non-etcd pod yamls anymore.

Version-Release number of selected component (if applicable):

4.14 and later

How reproducible:

always

Steps to Reproduce:

1. create a cluster
2. restore using cluster-restore.sh

Actual results:

the apiserver and other static pods are not immediately restored

The script only outputs this log:

removing previous backup /var/lib/etcd-backup/member
Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod

Expected results:

the non-etcd static pods should be immediately restored by moving them into the manifest directory again.

You can see this by the log output:

Moving etcd data-dir /var/lib/etcd/member to /var/lib/etcd-backup
starting restore-etcd static pod
starting kube-apiserver-pod.yaml
static-pod-resources/kube-apiserver-pod-7/kube-apiserver-pod.yaml
starting kube-controller-manager-pod.yaml
static-pod-resources/kube-controller-manager-pod-7/kube-controller-manager-pod.yaml
starting kube-scheduler-pod.yaml
static-pod-resources/kube-scheduler-pod-8/kube-scheduler-pod.yaml

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1112

Bug OCPBUGS-21653: [gcp] please clarify what's wrong with the userLabel key "a"

View the Description View the linked PRs

Description of problem:

setting key beging "a" for platform.gcp.userLabels got error message which doesn't explain what's wrong exactly

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-15-164249

How reproducible:

Always

Steps to Reproduce:

1. "create install-config"
2. edit the install-config.yaml to insert userLabels settings (see [1])
3. "create cluster"

Actual results:

Error message shows up telling the label key "a" is invalid.

Expected results:

There should be no error, according to the statement "A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`".

Additional info:

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-10-15-164249
built from commit 359866f9f6d8c86e566b0aea7506dad22f59d860
release image registry.ci.openshift.org/ocp/release@sha256:3c5976a39479e11395334f1705dbd3b56580cd1dcbd514a34d9c796b0a0d9f8e
release architecture amd64
$ openshift-install explain installconfig.platform.gcp.userLabels
KIND:     InstallConfig
VERSION:  v1

RESOURCE: <[]object>
  userLabels has additional keys and values that the installer will add as labels to all resources that it creates on GCP. Resources created by the cluster itself may not include these labels. This is a TechPreview feature and requires setting CustomNoUpgrade featureSet with GCPLabelsTags featureGate enabled or TechPreviewNoUpgrade featureSet to configure labels.

FIELDS:
    key <string> -required-
      key is the key part of the label. A label key can have a maximum of 63 characters and cannot be empty. Label must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-`.    value <string> -required-
      value is the value part of the label. A label value can have a maximum of 63 characters and cannot be empty. Value must contain only lowercase letters, numeric characters, and the following special characters `_-`.

$ 

[1]
$ yq-3.3.0 r test12/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  userLabels:
  - key: createdby
    value: installer-qe
  - key: a
    value: hello
$ yq-3.3.0 r test12/install-config.yaml featureSet
TechPreviewNoUpgrade
$ yq-3.3.0 r test12/install-config.yaml credentialsMode
Passthrough
$ openshift-install create cluster --dir test12
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.gcp.userLabels[a]: Invalid value: "hello": label key is invalid or contains invalid characters. Label key can have a maximum of 63 characters and cannot be empty. Label key must begin with a lowercase letter, and must contain only lowercase letters, numeric characters, and the following special characters `_-` 
$

Bug OCPBUGS-24346: Failed to create STS resources in China regions using ccoctl

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/635

Bug OCPBUGS-11418: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7091

Bug OCPBUGS-15658: [azuredisk-csi-driver] Track enablePerformancePlus issue

View the Description View the linked PRs

Description of problem:

This Jira is filed to track upstream issue (fix and backport) 
https://github.com/kubernetes-sigs/azuredisk-csi-driver/issues/1893

Version-Release number of selected component (if applicable):

4.14

https://github.com/openshift/azure-disk-csi-driver/pull/45

Bug OCPBUGS-11083: NTO: e2e: TuneD parameters check test is flaky

View the Description View the linked PRs

Description of problem:

After enabling realtime and high power consumption under workload hints in the performance profile, the test is falling since it cannot find stalld pid:
msg: "failed to run command [pidof stalld]: output \"\"; error \"\"; command terminated with exit code 1",

Version-Release number of selected component (if applicable):

Openshift 4.14, 4.13

How reproducible:

Often (Flaky test)

Bug OCPBUGS-16071: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openshift-controller-manager/pull/263

Bug OCPBUGS-18094: ACM cluster dropdown shouldn't have filter and clusters title

View the Description View the linked PRs

Description of problem:

the acm dropdown has a filter and clusters title even though there are only ever 2 items in the dropdown, local cluster and all clusters. it has been reported by a customer as confusing that they can add many clusters to the dropdown.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. install acm dynamic plugin to cluster
2. open cluster dropdown
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13137

Bug MGMT-13957: [Agent] Each reboot will send logs to the service

View the Description View the linked PRs

We are sending logs to `/api/assisted-install/v2/clusters/05811ea0-33ff-461d-8898-7aed48224218/logs?logs_type=node-boot&host_id=f6baac5b-65a4-5838-bba7-6a240f4ea9d3` kind of path indefinitely , as long as an host reboots.

When rebooting, it doesn't matter if already sent logs in the past, it will still send logs.

Example:

https://grafana.stage.devshift.net/explore?orgId=1&left=%7B%22datasource%22:%22loki-production%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22expr%22:%22%7Bkubernetes_namespace_name%3D%5C%22assisted-installer-production%5C%22,kubernetes_container_name%3D%5C%22envoy-sidecar%5C%22%7D%20%7C~%20%5C%22.*f6baac5b-65a4-5838-bba7-6a240f4ea9d3.*%5C%22%20%7C%20json%20%7C%20line_format%20%5C%22%7B%7B.message%7D%7D%5C%22%20%7C%20json%20%7C%20line_format%20%5C%22bytes_rx:%20%7B%7B.bytes_rx%7D%7D.%20response_flags:%7B%7B.response_flags%7D%7D,%20response_code:%7B%7B.response_code%7D%7D,%20duration:%20%7B%7B.duration%7D%7D,%20path:%20%7B%7B.path%7D%7D%5C%22%22%7D%5D,%22range%22:%7B%22from%22:%22now-15d%22,%22to%22:%22now%22%7D%7D

The above cluster installed successfully March 3rd 2023 @ 22:20:06

https://github.com/openshift/assisted-service/pull/5037

Bug OCPBUGS-10313: The agent-tui shows again during the installation

View the Description View the linked PRs

Description of problem:

Agent-tui should show before the installation, but it shows again during the installation and when it quit again, the installation fail to go on.

Version-Release number of selected component (if applicable):

4.13.0-0.ci-2023-03-14-045458

How reproducible:

always

Steps to Reproduce:

1. Make sure the primary check pass, and boot the agent.x86_64.iso file, we can see the agent-tui show before the installation

2. Tracking installation by both wait-for output and console output

3. The agent-tui show again during the installation, wait for the agent-tui quit automatically without any user interruption, the installation quit with failure, and we have the following wait-for output:

DEBUG asset directory: .                           
DEBUG Loading Agent Config...                      
...
DEBUG Agent Rest API never initialized. Bootstrap Kube API never initialized 
INFO Waiting for cluster install to initialize. Sleeping for 30 seconds 
DEBUG Agent Rest API Initialized                   
INFO Cluster is not ready for install. Check validations 
DEBUG Cluster validation: The pull secret is set.  
WARNING Cluster validation: The cluster has hosts that are not ready to install. 
DEBUG Cluster validation: The cluster has the exact amount of dedicated control plane nodes. 
DEBUG Cluster validation: API virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: API virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: The Cluster Network CIDR is defined. 
DEBUG Cluster validation: The base domain is defined. 
DEBUG Cluster validation: Ingress virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: Ingress virtual IPs are not required: User Managed Networking 
DEBUG Cluster validation: The Machine Network CIDR is defined. 
DEBUG Cluster validation: The Cluster Machine CIDR is not required: User Managed Networking 
DEBUG Cluster validation: The Cluster Network prefix is valid. 
DEBUG Cluster validation: The cluster has a valid network type 
DEBUG Cluster validation: Same address families for all networks. 
DEBUG Cluster validation: No CIDRS are overlapping. 
DEBUG Cluster validation: No ntp problems found    
DEBUG Cluster validation: The Service Network CIDR is defined. 
DEBUG Cluster validation: cnv is disabled          
DEBUG Cluster validation: lso is disabled          
DEBUG Cluster validation: lvm is disabled          
DEBUG Cluster validation: odf is disabled          
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Valid inventory exists for the host 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient CPU cores 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient minimum RAM 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient disk capacity 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient CPU cores for role master 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Sufficient RAM for role master 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Hostname openshift-qe-049.arm.eng.rdu2.redhat.com is unique in cluster 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Hostname openshift-qe-049.arm.eng.rdu2.redhat.com is allowed 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Speed of installation disk has not yet been measured 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host is compatible with cluster platform none 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: VSphere disk.EnableUUID is enabled for this virtual machine 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host agent compatibility checking is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No request to skip formatting of the installation disk 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: All disks that have skipped formatting are present in the host inventory 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host is connected 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Media device is connected 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No Machine Network CIDR needed: User Managed Networking 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host belongs to all machine network CIDRs 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host has connectivity to the majority of hosts in the cluster 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Platform PowerEdge R740 is allowed 
WARNING Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host couldn't synchronize with any NTP server 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host clock is synchronized with service 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: All required container images were either pulled successfully or no attempt was made to pull them 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Network latency requirement has been satisfied. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Packet loss requirement has been satisfied. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host has been configured with at least one default route. 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the api.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the api-int.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Domain name resolution for the *.apps.zniusno.arm.eng.rdu2.redhat.com domain was successful or not required 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host subnets are not overlapping 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: No IP collisions were detected by host 7a9649d8-4167-a1f9-ad5f-385c052e2744 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: cnv is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: lso is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: lvm is disabled 
DEBUG Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: odf is disabled 
WARNING Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host couldn't synchronize with any NTP server) 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com validation: Host NTP is synced 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from insufficient to known (Host is ready to be installed) 
INFO Cluster is ready for install                 
INFO Cluster validation: All hosts in the cluster are ready to install. 
INFO Preparing cluster for installation           
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from known to preparing-for-installation (Host finished successfully to prepare for installation) 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: New image status registry.ci.openshift.org/ocp/4.13-2023-03-14-045458@sha256:b0d518907841eb35adbc05962d4b2e7d45abc90baebc5a82d0398e1113ec04d0. result: success. time: 1.35 seconds; size: 401.45 Megabytes; download rate: 312.54 MBps 
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from preparing-for-installation to preparing-successful (Host finished successfully to prepare for installation) 
INFO Cluster installation in progress             
INFO Host openshift-qe-049.arm.eng.rdu2.redhat.com: updated status from preparing-successful to installing (Installation is in progress) 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Starting installation: bootstrap 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Installing: bootstrap 
INFO Host: openshift-qe-049.arm.eng.rdu2.redhat.com, reached installation stage Failed: failed executing nsenter [--target 1 --cgroup --mount --ipc --pid -- podman run --net host --pid=host --volume /:/rootfs:rw --volume /usr/bin/rpm-ostree:/usr/bin/rpm-ostree --privileged --entrypoint /usr/bin/machine-config-daemon registry.ci.openshift.org/ocp/4.13-2023-03-14-045458@sha256:f85a278868035dc0a40a66ea7eaf0877624ef9fde9fc8df1633dc5d6d1ad4e39 start --node-name localhost --root-mount /rootfs --once-from /opt/install-dir/bootstrap.ign --skip-reboot], Error exit status 255, LastOutput "...  to initialize single run daemon: error initializing rpm-ostree: Error while ensuring access to kublet config.json pull secrets: symlink /var/lib/kubelet/config.json /run/ostree/auth.json: file exists" 
INFO Cluster has hosts in error                   
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation 
INFO cluster has stopped installing... working to recover installation   

4. During the installation, we had NetworkManager-wait-online.service for a while:
-- Logs begin at Wed 2023-03-15 03:06:29 UTC, end at Wed 2023-03-15 03:27:30 UTC. --
Mar 15 03:18:52 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: Starting Network Manager Wait Online...
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: NetworkManager-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: NetworkManager-wait-online.service: Failed with result 'exit-code'.
Mar 15 03:19:55 openshift-qe-049.arm.eng.rdu2.redhat.com systemd[1]: Failed to start Network Manager Wait Online.

Expected results:

The TUI should only show once before the installation.

https://github.com/openshift/installer/pull/6977

Bug OCPBUGS-16540: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/164

Bug OCPBUGS-23078: 4.14: Fix and bump library-go for storage operators

View the Description View the linked PRs

We need to fix and bump library-go for http2 vulnerability CVE-2023-44487. This effectively turns off HTTP/2 in library-go http endpoints, i.e. metrics and health.

Bug OCPBUGS-8449: [azure] Install fails when setting diskEncryptionSet under defaultMachinePlatform/controlPlane/compute without subscriptionId

View the Description View the linked PRs

Description of problem:

Configure diskEncryptionSet as below in install-config.yaml, and not set subscriptionID as it is optional parameter.

install-config.yaml
--------------------------------
compute:
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    azure:
      encryptionAtHost: true
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des
  replicas: 3
controlPlane:
  architecture: amd64
  hyperthreading: Enabled
  name: master
  platform:
    azure:
      encryptionAtHost: true
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des
  replicas: 3
platform:
  azure:
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: centralus
    defaultMachinePlatform:
      osDisk:
        diskEncryptionSet:
          resourceGroup: jima07a-rg
          name: jima07a-des

Then create manifests file and create cluster, installer failed with error:
$ ./openshift-install create cluster --dir ipi --log-level debug
...
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.azure.defaultMachinePlatform.osDisk.diskEncryptionSet: Invalid value: azure.DiskEncryptionSet{SubscriptionID:"", ResourceGroup:"jima07a-rg", Name:"jima07a-des"}: failed to get disk encryption set: compute.DiskEncryptionSetsClient#Get: Failure responding to request: StatusCode=400 -- Original Error: autorest/azure: Service returned an error. Status=400 Code="InvalidSubscriptionId" Message="The provided subscription identifier 'resourceGroups' is malformed or invalid." 

Checked manifest file cluster-config.yaml, and found that subscriptionId is not filled out automatically under defaultMachinePlatform
$ cat cluster-config.yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: qe.azure.devcluster.openshift.com
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform:
        azure:
          encryptionAtHost: true
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
              subscriptionId: 53b8f551-f0fc-4bea-8cba-6d1fefd54c8a
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
      replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform:
        azure:
          encryptionAtHost: true
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
              subscriptionId: 53b8f551-f0fc-4bea-8cba-6d1fefd54c8a
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
      replicas: 3
    metadata:
      creationTimestamp: null
      name: jimadesa
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.0.0.0/16
      networkType: OVNKubernetes
      serviceNetwork:
      - 172.30.0.0/16
    platform:
      azure:
        baseDomainResourceGroupName: os4-common
        cloudName: AzurePublicCloud
        defaultMachinePlatform:
          osDisk:
            diskEncryptionSet:
              name: jima07a-des
              resourceGroup: jima07a-rg
            diskSizeGB: 0
            diskType: ""
          osImage:
            offer: ""
            publisher: ""
            sku: ""
            version: ""
          type: ""
        outboundType: Loadbalancer
        region: centralus
    publish: External

It works well when setting disk encryption set without subscriptionId under defalutMachinePlatform or controlPlane/compute.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-05-104719

How reproducible:

Always on 4.11, 4.12, 4.13

Steps to Reproduce:

1. Prepare install-config, configure diskEncrpytionSet under defaultMchinePlatform, controlPlane and compute without subscriptionId
2. Install cluster 
3.

Actual results:

Cluster is installed successfully

Expected results:

installer failed

Additional info:

Bug OCPBUGS-16415: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7353

Bug OCPBUGS-19789: Creating an OperatorGroup with "name: cluster" breaks the whole cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-14698~~. The following is the description of the original issue:
—
Description of problem:

Creating an OperatorGroup resource having "name: cluster" causes major issues and we can't login to the cluster anymore.

After this command all "oc" commands fail including "oc login...". The console/oauth endpoint showed:
{"error":"server_error","error_description":"The authorization server encountered an unexpected condition that prevented it from fulfilling the request.","state":"c72af27d"}

Notes:
- Restarting the cluster doesn't solve the problem, it's a persistent.
- Reproduced with OpenShift 4.12.12 and 4.13.0 in different environments (ROSA, CRC...)
- Using a different name for the OperatorGroup is a simple workaround. The name "cluster" seems to cause the problem.
- It doesn't matter what namespace the OperatorGroup is created in or how the "spec" looks like

Version-Release number of selected component (if applicable):

How reproducible:

Repeatedly

Steps to Reproduce:

Steps to reproduce - by logged in as cluster-admin: 
$ oc apply -f - <<EOF
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: cluster
spec: {}
EOF

Actual results:

Expected results:

Additional info:

The root cause seems to be that OLM overwrites the "cluster-admin" role

https://github.com/openshift/operator-framework-olm/pull/566

Bug OCPBUGS-4998: wait-for command doesn't handle installing-pending-user-action

View the Description View the linked PRs

If the cluster enters the installing-pending-user-action state in assisted-service, it will not recover absent user action.
One way to reproduce this is to have the wrong boot order set in the host, so that it reboots into the agent ISO again instead of the installed CoreOS on disk. (I managed this in dev-scripts by setting a root device hint that pointed to a secondary disk, and only creating that disk once the VM was up. This does not add the new disk to the boot order list, and even if you set it manually it does not take effect until after a full shutdown of the VM - the soft reboot doesn't count.)

Currently we report:

cluster has stopped installing... working to recover installation

in a loop. This is not accurate (unlike in e.g. the install-failed state) - it cannot be recovered automatically.

Also we should only report this, or any other, status once when the status changes, and not continuously in a loop.

https://github.com/openshift/installer/pull/7060

Bug MGMT-13846: Unclear behavior when setting user managed network

View the Description View the linked PRs

Description of the problem:
Creating a cluster with ingress VIPs and user managed network will return an error

 
{
  "lastProbeTime": "2023-03-01T18:50:41Z",
  "lastTransitionTime": "2023-03-01T18:50:41Z",
  "message": "The Spec could not be synced due to an input error: API VIP cannot be set with User Managed Networking",
  "reason": "InputError",
  "status": "False",
  "type": "SpecSynced"
}

but setting ingress VIPs and user manged network to false and then edit only user managed network will not result in any error, will the cluster be using user managed network in this case?

How reproducible:

Steps to reproduce:

1. apply

apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
  name: acimulinode
  namespace: mfilanov
spec:
  apiVIP: 1.2.3.8
  apiVIPs:
   - 1.2.3.8
  clusterDeploymentRef:
    name: multinode
  imageSetRef:
    name: img4.12.5-x86-64-appsub
  ingressVIP: 1.2.3.10
  platformType: BareMetal
  networking:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    serviceNetwork:
    - 172.30.0.0/16
    userManagedNetworking: false
  provisionRequirements:
    controlPlaneAgents: 3
  compute:
  - hyperthreading: Enabled
    name: worker
  controlPlane:
    hyperthreading: Enabled
    name: master

2. check conditions

kubectl get aci -n mfilanov -o json | jq .items[].status.conditions[]
{
  "lastProbeTime": "2023-03-01T18:52:08Z",
  "lastTransitionTime": "2023-03-01T18:52:08Z",
  "message": "SyncOK",
  "reason": "SyncOK",
  "status": "True",
  "type": "SpecSynced"
}

3. edit user managed network and apply again

apiVersion: extensions.hive.openshift.io/v1beta1
kind: AgentClusterInstall
metadata:
  name: acimulinode
  namespace: mfilanov
spec:
  apiVIP: 1.2.3.8
  apiVIPs:
   - 1.2.3.8
  clusterDeploymentRef:
    name: multinode
  imageSetRef:
    name: img4.12.5-x86-64-appsub
  ingressVIP: 1.2.3.10
  platformType: BareMetal
  networking:
    clusterNetwork:
    - cidr: 10.128.0.0/14
      hostPrefix: 23
    serviceNetwork:
    - 172.30.0.0/16
    userManagedNetworking: true
  provisionRequirements:
    controlPlaneAgents: 3
  compute:
  - hyperthreading: Enabled
    name: worker
  controlPlane:
    hyperthreading: Enabled
    name: master

Actual results:

kubectl get aci -n mfilanov -o json | jq .items[].status.conditions[]
{
  "lastProbeTime": "2023-03-01T18:52:08Z",
  "lastTransitionTime": "2023-03-01T18:52:08Z",
  "message": "SyncOK",
  "reason": "SyncOK",
  "status": "True",
  "type": "SpecSynced"
}

Expected results:
probably should get an error because ingress vips already set

https://github.com/openshift/assisted-service/pull/5071

Bug OCPBUGS-20825: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/86

Bug OCPBUGS-8203: Passwords printed in log messages

View the Description View the linked PRs

When processing an install-config containing either BMC passwords in the baremetal platform config, or a vSphere password in the vsphere platform config, we log a warning message to say that the value is ignored.

This warning currently includes the value in the password field, which may be inconvenient for users reusing IPI configs who don't want their password values to appear in logs.

https://github.com/openshift/installer/pull/6922

Bug OCPBUGS-13621: Singular Ingress and API cluster VIPs cannot be removed via cluster update API

View the Description View the linked PRs

Description of problem:

a cluster update request with empty strings for api_vip and ingress_vip will not remove the cluster vips.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. See the following test: https://gist.github.com/nmagnezi/4a3dad01ee197d3984fa7a0604b62cc0
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/assisted-service/pull/5216

Bug OCPBUGS-14795: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/391

Bug OCPBUGS-21439: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-azure/pull/93

Bug OCPBUGS-10394: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6976

Bug OCPBUGS-10423: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2298

Bug OCPBUGS-11691: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/247

Bug OCPBUGS-13158: Run in-cluster disruption tests

View the Description View the linked PRs

Along with external disruption tests via api DNS we should also check that apiserver is not disrupted via api-int and service network endpoints

Ref: https://issues.redhat.com/browse/API-1526

Bug OCPBUGS-14076: PowerVS: Remove ClusterOSImage

View the Description View the linked PRs

Description of problem:



Remove changing the image name for a MachineSet if ClusterOSImage is set

Terraform has already created an image bucket based on OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE
for us. So worker nodes should not use OPENSHIFT_INSTALL_OS_IMAGE_OVERRIDE directly and instead use the image bucket.

Version-Release number of selected component (if applicable):

current master branch

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6996

Bug OCPBUGS-8349: Bootstrap kubelet client cert should include system:serviceaccounts group

View the Description View the linked PRs

Description of problem:

On a freshly installed cluster, the control-plane-machineset-operator begins rolling a new master node, but the machine remains in a Provisioned state and never joins as a node.

Its status is:
Drain operation currently blocked by: [{Name:EtcdQuorumOperator Owner:clusteroperator/etcd}]

The cluster is left in this state until an admin manually removes the stuck master node, at which point a new master machine is provisioned and successfully joins the cluster.

Version-Release number of selected component (if applicable):

4.12.4

How reproducible:

Observed at least 4 times over the last week, but unsure on how to reproduce.

Actual results:

A master node remains in a stuck Provisioned state and requires manual deletion to unstick the control plane machine set process.

Expected results:

No manual interaction should be necessary.

Additional info:

https://github.com/openshift/installer/pull/7032

Bug OCPBUGS-12825: 4.14 prometheus image should be built with go1.20

View the Description View the linked PRs

Description of problem:

based on bugs from ART team, example: https://issues.redhat.com/browse/OCPBUGS-12347, 4.14 image should be built with go 1.20, but prometheus container image is built by go1.19.6

$ token=`oc create token prometheus-k8s -n openshift-monitoring`
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/label/goversion/values' | jq
{
  "status": "success",
  "data": [
    "go1.19.6",
    "go1.20.3"
  ]
}

searched from thanos API

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query={__name__=~".*",goversion="go1.19.6"}' | jq
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "prometheus_build_info",
          "branch": "rhaos-4.14-rhel-8",
          "container": "kube-rbac-proxy",
          "endpoint": "metrics",
          "goarch": "amd64",
          "goos": "linux",
          "goversion": "go1.19.6",
          "instance": "10.128.2.19:9092",
          "job": "prometheus-k8s",
          "namespace": "openshift-monitoring",
          "pod": "prometheus-k8s-0",
          "prometheus": "openshift-monitoring/k8s",
          "revision": "fe01b9f83cb8190fc8f04c16f4e05e87217ab03e",
          "service": "prometheus-k8s",
          "tags": "unknown",
          "version": "2.43.0"
        },
        "value": [
          1682576802.496,
          "1"
        ]
      },
...

prometheus-k8s-0 container name: [prometheus config-reloader thanos-sidecar prometheus-proxy kube-rbac-proxy kube-rbac-proxy-thanos], prometheus image is built with go1.19.6

$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- prometheus --version
prometheus, version 2.43.0 (branch: rhaos-4.14-rhel-8, revision: fe01b9f83cb8190fc8f04c16f4e05e87217ab03e)
  build user:       root@402ffbe02b57
  build date:       20230422-00:43:08
  go version:       go1.19.6
  platform:         linux/amd64
  tags:             unknown

$ oc -n openshift-monitoring exec -c config-reloader prometheus-k8s-0 -- prometheus-config-reloader --version
prometheus-config-reloader, version 0.63.0 (branch: rhaos-4.14-rhel-8, revision: ce71a7d)
  build user:       root
  build date:       20230424-15:53:51
  go version:       go1.20.3
  platform:         linux/amd64

$ oc -n openshift-monitoring exec -c thanos-sidecar prometheus-k8s-0 -- thanos --version
thanos, version 0.31.0 (branch: rhaos-4.14-rhel-8, revision: d58df6d218925fd007e16965f50047c9a4194c42)
  build user:       root@c070c5e6af32
  build date:       20230422-00:44:21
  go version:       go1.20.3
  platform:         linux/amd64


# owned by oauth team, not responsible by Monitoring
$ oc -n openshift-monitoring exec -c prometheus-proxy prometheus-k8s-0 -- oauth-proxy --version
oauth2_proxy was built with go1.18.10

# below isssue is tracked by bug OCPBUGS-12821
$ oc -n openshift-monitoring exec -c kube-rbac-proxy prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

$ oc -n openshift-monitoring exec -c kube-rbac-proxy-thanos prometheus-k8s-0 -- kube-rbac-proxy --version
Kubernetes v0.0.0-master+$Format:%H$

should fix files
https://github.com/openshift/prometheus/blob/master/.ci-operator.yaml#L4
https://github.com/openshift/prometheus/blob/master/Dockerfile.ocp#L1

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-26-154754

How reproducible:

always

Actual results:

4.14 prometheus is built with go1.19.6

Expected results:

4.14 prometheus image should be built with go1.20

Additional info:

no functional impact

https://github.com/openshift/prometheus/pull/160

Bug OCPBUGS-14810: Update OWNERS and OWNERS_ALIASES in livenessprobe repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-livenessprobe/pull/45

Bug OCPBUGS-15659: Installer should have pre-check for capability MachineAPI when installing IPI without it

View the Description View the linked PRs

Description of problem:

On 4.14, 'MachineAPI' is marked as optional capability which will disable two operators machine-api and cluster-autoscaler.

epic link: https://issues.redhat.com/browse/CNF-6318

And operator machine-api is required for common IPI (no SNO and no compact) cluster, so if disabling "MachineAPI" in install-config.yaml, common IPI cluster will be installed failed.

Suggest to have pre-check on installer side for common IPI (no SNO and no compact) when running "openshift-installer create cluster". If MachineAPI is disabled, installer should exit with corresponding messages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-30-131338

How reproducible:

Always

Steps to Reproduce:

1. Prepare install-config.yaml and set baselineCapabilitySet as None, make sure that compute node number is greater than 0.
2. Run command "openshift-install create cluster" to install common IPI
3.

Actual results:

Installation failed since missing machine-api operator

Expected results:

Installer should have pre-check for this scenario and exit with error message if MachineAPI is disabled

Additional info:

https://github.com/openshift/installer/pull/7414

Bug OCPBUGS-16508: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/153

Bug OCPBUGS-17719: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/591

Bug OCPBUGS-6770: Pipeline doesn't render correctly when displayed but looks fine in edit mode

View the Description View the linked PRs

When displaying my pipeline it is not rendered correctly with overlapping segments between parallel branches. However if I edit the pipeline then it appears fine. I have attached screenshots showing the issue.

This is a regression from 4.11 where it rendered fine.

https://github.com/openshift/console/pull/12722

Bug OCPBUGS-14008: kube-apiserver on SNO should disallow access before readyz endpoint is passing

View the Description View the linked PRs

https://redhat-internal.slack.com/archives/CB48XQ4KZ/p1684775113222139?thread_ts=1684769886.464419&cid=CB48XQ4KZ

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1500

Bug OCPBUGS-14072: TestAlertmanagerUWMSecrets test flaky

View the Description View the linked PRs

colored textTestAlertmanagerUWMSecrets is one the of test that times out see https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1971/pull-ci-openshift-cluster-monitoring-operator-master-e2e-agnostic-operator/1661649123104788480. Apparently it takes longer for UWM alertmanager to become ready.

https://github.com/openshift/cluster-monitoring-operator/pull/1973

Bug OCPBUGS-17827: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-25217: Use better update strategy for konnectivty daemonset

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24261~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3308

Bug OCPBUGS-13355: make `oc` reuse the tokenrequest code from library-go

View the Description View the linked PRs

Description of problem:

Library-go contains code for creating token requests that should be reused by all OpenShift components. Because of time-constraints, this code did not make it to `oc` in the past.

Fix that to prevent code out-of-sync issues.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. see if https://github.com/openshift/oc/pull/991 merged

Actual results:

it hasn't merged at the time of writing this bug

Expected results:

it's merged

Additional info:

https://github.com/openshift/oc/pull/991

Bug OCPBUGS-23371: OVN secondary network annotation timeout in hosted pod using Kubevirt provider

View the Description View the linked PRs

Description of problem:

A net-attach-def using "type: ovn-k8s-cni-overlay, topology:layer2"
does not work in a hosted pod when using the Kubevirt provider.

Note: As a general hosted multus sanity check, using a "type: bridge" NAD does work properly in a hosted pod and both interfaces start as expected:
  Normal  AddedInterface  86s   multus             Add eth0 [10.133.0.21/23] from ovn-kubernetes
  Normal  AddedInterface  86s   multus             Add net1 [192.0.2.193/27] from default/bridge-net

Version-Release number of selected component (if applicable):

OCP 4.14.1
CNV 4.14.0-2385

How reproducible:

Reproduced w/ multiple attempts when using OVN secondary network

Steps to Reproduce:

1. Create the NAD on the hosted Kubevirt cluster:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: l2-network
spec:
  config: |-
    {
      "cniVersion": "0.3.1",
      "name": "l2-network",
      "type": "ovn-k8s-cni-overlay",
      "topology":"layer2",
      "netAttachDefName": "default/l2-network"
    }

2. Create a hosted pod w/ that net annotation:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.v1.cni.cncf.io/networks:  '[
      {
        "name": "l2-network",
        "interface": "net1",
        "ips": [
          "192.0.2.22/24"
          ]
      }
    ]'
  name: debug-ovnl2-c
  namespace: default
spec:
  securityContext:
    seccompProfile:
      type: RuntimeDefault
    runAsNonRoot: true
    runAsUser: 1000
  containers:
  - name: debug-ovnl2-c
    command:
    - /usr/bin/bash
    - -x
    - -c
    - |
      sleep infinity
    image: quay.io/cloud-bulldozer/uperf:latest
    imagePullPolicy: Always
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
  nodeSelector:
    kubernetes.io/hostname: kv1-a8a5d7f1-9xwm4

3. Pod remains in ContainerCreating because it cannot create the net1 iface, pod describe event logs:

Events:
  Type     Reason                  Age    From               Message
  ----     ------                  ----   ----               -------
  Normal   Scheduled               4m21s  default-scheduler  Successfully assigned default/debug-ovnl2-c to kv1-a8a5d7f1-9xwm4
  Warning  FailedCreatePodSandBox  2m20s  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 Netns:/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73" Netns:"/var/run/netns/5da048e3-b534-481d-acc6-2ddc6a439586" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] [default/debug-ovnl2-c 1e2d9008074c3c5af5ccbb2e7e2e7ca2466395b642a1677db2dfadd35eb84b73 network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Warning  FailedCreatePodSandBox  19s  kubelet  Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_debug-ovnl2-c_default_1b42bc5a-1148-49d8-a2d0-7689a46f59ea_0(48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb): error adding pod default_debug-ovnl2-c to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: '&{ContainerID:48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb Netns:/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2 IfName:eth0 Args:IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689a46f59ea Path: StdinData:[123 34 98 105 110 68 105 114 34 58 34 47 118 97 114 47 108 105 98 47 99 110 105 47 98 105 110 34 44 34 99 104 114 111 111 116 68 105 114 34 58 34 47 104 111 115 116 114 111 111 116 34 44 34 99 108 117 115 116 101 114 78 101 116 119 111 114 107 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 47 49 48 45 111 118 110 45 107 117 98 101 114 110 101 116 101 115 46 99 111 110 102 34 44 34 99 110 105 67 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 101 116 99 47 99 110 105 47 110 101 116 46 100 34 44 34 99 110 105 86 101 114 115 105 111 110 34 58 34 48 46 51 46 49 34 44 34 100 97 101 109 111 110 83 111 99 107 101 116 68 105 114 34 58 34 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 103 108 111 98 97 108 78 97 109 101 115 112 97 99 101 115 34 58 34 100 101 102 97 117 108 116 44 111 112 101 110 115 104 105 102 116 45 109 117 108 116 117 115 44 111 112 101 110 115 104 105 102 116 45 115 114 105 111 118 45 110 101 116 119 111 114 107 45 111 112 101 114 97 116 111 114 34 44 34 108 111 103 76 101 118 101 108 34 58 34 118 101 114 98 111 115 101 34 44 34 108 111 103 84 111 83 116 100 101 114 114 34 58 116 114 117 101 44 34 109 117 108 116 117 115 65 117 116 111 99 111 110 102 105 103 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 99 110 105 47 110 101 116 46 100 34 44 34 109 117 108 116 117 115 67 111 110 102 105 103 70 105 108 101 34 58 34 97 117 116 111 34 44 34 110 97 109 101 34 58 34 109 117 108 116 117 115 45 99 110 105 45 110 101 116 119 111 114 107 34 44 34 110 97 109 101 115 112 97 99 101 73 115 111 108 97 116 105 111 110 34 58 116 114 117 101 44 34 112 101 114 78 111 100 101 67 101 114 116 105 102 105 99 97 116 101 34 58 123 34 98 111 111 116 115 116 114 97 112 75 117 98 101 99 111 110 102 105 103 34 58 34 47 118 97 114 47 108 105 98 47 107 117 98 101 108 101 116 47 107 117 98 101 99 111 110 102 105 103 34 44 34 99 101 114 116 68 105 114 34 58 34 47 101 116 99 47 99 110 105 47 109 117 108 116 117 115 47 99 101 114 116 115 34 44 34 99 101 114 116 68 117 114 97 116 105 111 110 34 58 34 50 52 104 34 44 34 101 110 97 98 108 101 100 34 58 116 114 117 101 125 44 34 115 111 99 107 101 116 68 105 114 34 58 34 47 104 111 115 116 47 114 117 110 47 109 117 108 116 117 115 47 115 111 99 107 101 116 34 44 34 116 121 112 101 34 58 34 109 117 108 116 117 115 45 115 104 105 109 34 125]} ContainerID:"48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb" Netns:"/var/run/netns/cae8fab7-80c2-40b7-b1a7-49c8fc8732b2" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=debug-ovnl2-c;K8S_POD_INFRA_CONTAINER_ID=48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb;K8S_POD_UID=1b42bc5a-1148-49d8-a2d0-7689
a46f59ea" Path:"" ERRORED: error configuring pod [default/debug-ovnl2-c] networking: [default/debug-ovnl2-c/1b42bc5a-1148-49d8-a2d0-7689a46f59ea:l2-network]: error adding container to network "l2-network": CNI request failed with status 400: '[default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] [default/debug-ovnl2-c 48110f0ecc0979992108e4441ff06f50c0d90f527cbe0b8fe1ca18d5398b67eb network l2-network NAD default/l2-network] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
'
'
  Normal  AddedInterface  18s (x3 over 4m20s)  multus  Add eth0 [10.133.0.21/23] from ovn-kubernetes

Actual results:

Pod cannot start

Expected results:

Pod can start with additional "ovn-k8s-cni-overlay" network

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/C02UVQRJG83/p1698857051578159
I did confirm the same NAD and pod definition start fine on the management cluster.

https://github.com/openshift/cluster-network-operator/pull/2117

Bug OCPBUGS-27193: [ OCP 4.14] IPXE connection timed out

View the Description View the linked PRs

This is a clone of issue OCPBUGS-22699. The following is the description of the original issue:
—
Description of problem:

New deployment of BM IPI using provisioning network with IPV6 is showing:

http://XXXX:XXXX:XXXX:XXXX::X:6180/images/ironic-python-agernt.kernel....
connection timed out (http://ipxe.org/4c0a6092)" error

Version-Release number of selected component (if applicable):

Openshift 4.12.32
Also seen in Openshift 4.14.0-rc.5 when adding new nodes

How reproducible:

Very frequent

Steps to Reproduce:

1. Deploy cluster using BM with provided config
2.
3.

Actual results:

Consistent failures depending of the version of OCP used to deploy

Expected results:

No error, successful deployment

Additional info:

Things checked while the bootstrap host is active and the installation information is still valid (and failing):
- tried downloading the "ironic-python-agent.kernel" file from different places (bootstrap, bastion hosts, another provisioned host) and in all cases it worked:
[core@control-1-ru2 ~]$ curl -6 -v -o ironic-python-agent.kernel http://[XXXX:XXXX:XXXX:XXXX::X]:80/images/ironic-python-agent.kernel
\*   Trying XXXX:XXXX:XXXX:XXXX::X...
\* TCP_NODELAY set
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload  Upload   Total   Spent    Left  Speed
0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* Connected to XXXX:XXXX:XXXX:XXXX::X (xxxx:xxxx:xxxx:xxxx::x) port 80   #0)
> GET /images/ironic-python-agent.kernel HTTP/1.1
> Host: [xxxx:xxxx:xxxx:xxxx::x]
> User-Agent: curl/7.61.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Date: Fri, 27 Oct 2023 08:28:09 GMT
< Server: Apache
< Last-Modified: Thu, 26 Oct 2023 08:42:16 GMT
< ETag: "a29d70-6089a8c91c494"
< Accept-Ranges: bytes
< Content-Length: 10657136
<
{ [14084 bytes data]
100 10.1M  100 10.1M    0     0   597M      0 --:--:-- --:--:-- --:--:--  597M
\* Connection #0 to host xxxx:xxxx:xxxx:xxxx::x left intact

This verifies some of the components like the network setup and the httpd service running on ironic pods.

- Also gathered listing of the contents of the ironic pod running in podman, specially in the shared directory. The contents of /shared/html/inspector.ipxe seems correct compared to a working installation, also all files look in place.

- Logs from the ironic container shows the errors coming from the node being deployed, we also show here the curl log to compare:

xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:19:55 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx::x - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 200 10657136 "-" "curl/7.61.1"
cxxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"
xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx:xxxx - - [27/Oct/2023:08:20:23 +0000] "GET /images/ironic-python-agent.kernel HTTP/1.1" 400 226 "-" "iPXE/1.0.0+ (4bd064de)"

Seems like an issue with iPXE and IPV6

https://github.com/openshift/ironic-image/pull/448

Bug OCPBUGS-8070: Egress router pods in pending state post upgrading cluster to 4.11

View the Description View the linked PRs

Description of problem:

After upgrading cluster from 4.10.47 to 4.11.25 issue is observed with Egress router pod, pods are in pending state.

Version-Release number of selected component (if applicable):

4.11.25

How reproducible:

Steps to Reproduce:

1. Upgrade from 4.10.47 to 4.11.25
2. Check if co network is in Managed state
3. Verify that egress pods are not created with errors like :
55s         Warning   FailedCreatePodSandBox   pod/******     (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox *******_d6918859-a4e9-4e5b-ba44-acc70499fa7c_0(9c464935ebaeeeab7be0b056c3f7ed1b7279e21445b9febea29eb280f7ee7429): error adding pod ****** to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [ns/pod/d6918859-a4e9-4e5b-ba44-acc70499fa7c:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'could not open netns "/var/run/netns/503fb77f-3b96-4f23-8356-43e7ae1e1b49": unknown FS magic on "/var/run/netns/503fb77f-3b96-4f23-8356-43e7ae1e1b49": 1021994

Actual results:

Egress router pods in pending state with error message as below:
$ omg get events 
...
49s        Warning  FailedCreatePodSandBox  pod/xxxx  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_xxxx_379fa7ec-4702-446c-9162-55c2f76989f6_0(86f8c76e9724216143bef024996cb14a7614d3902dcf0d3b7ea858298766630c): error adding pod xxx to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [xxxx/xxxx/379fa7ec-4702-446c-9162-55c2f76989f6:openshift-sdn]: error adding container to network "openshift-sdn": CNI request failed with status 400: 'could not open netns "/var/run/netns/0d39f378-29fd-4858-a947-51c5c06f1598": unknown FS magic on "/var/run/netns/0d39f378-29fd-4858-a947-51c5c06f1598": 1021994

Expected results:

Egress router pods in running state

Additional info:

Workaround from https://access.redhat.com/solutions/6986283 works :
Edit sdn DS in openshift-sdn namespace : 
- mountPath: /host/var/run/netns <<<<< /var/run/netns
  mountPropagation: HostToContainer
  name: host-run-netns   
  readOnly: true

https://github.com/openshift/cluster-network-operator/pull/1763

Bug OCPBUGS-10200: Update 4.14 openshift-enterprise-haproxy-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/455

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/455

Bug OCPBUGS-15905: ip-reconciler removes the overlappingrangeipreservations whether the pod is alive or not

View the Description View the linked PRs

Description of problem:

The reconciler removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources whether the pod is alive or not.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create pods and check the overlappingrangeipreservations.whereabouts.cni.cncf.io resources:

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
NAMESPACE          NAME                      AGE
openshift-multus   2001-1b70-820d-4b04--13   4m53s
openshift-multus   2001-1b70-820d-4b05--13   4m49s

2. Verify that when the ip-reconciler cronjob removes the overlappingrangeipreservations.whereabouts.cni.cncf.io resources when run:

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        14m             4d13h

$ oc get overlappingrangeipreservations.whereabouts.cni.cncf.io -A
No resources found

$ oc get cronjob -n openshift-multus
NAME            SCHEDULE       SUSPEND   ACTIVE   LAST SCHEDULE   AGE
ip-reconciler   */15 * * * *   False     0        5s              4d13h

Actual results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io resources are removed for each created pod by the ip-reconciler cronjob.
The "overlapping ranges" are not used.

Expected results:

The overlappingrangeipreservations.whereabouts.cni.cncf.io should not be removed regardless of if a pod has used an IP in the overlapping ranges.

Additional info:

https://github.com/openshift/whereabouts-cni/pull/167

Task MGMT-13947: Revert assisted boot reporter service

View the Description View the linked PRs

We are investigating issues with storage usage in production. Reverting until we have a root cause

https://github.com/openshift/assisted-service/pull/5035

Bug OCPBUGS-103: Service Binding Operator installation fails: "A subscription for this operator already exists in namespace ..."

View the Description View the linked PRs

Description of problem:
When "Service Binding Operator" is successfully installed in the cluster for the first time, the page will automatically redirect to Operator installation page with the error message "A subscription for this Operator already exists in Namespace "XXX" "

Notice: This issue only happened when the user installed "Service Binding Operator" for the first time. If the user uninstalls and re-installs the operator again, this issue will be gone

Version-Release number of selected components (if applicable):
4.12.0-0.nightly-2022-08-12-053438

How reproducible:
Always

Steps to Reproduce:

Login to OCP web console. Go to Operators -> OperatorHub page
Install "Service Binding Operator", wait until finish, check the page

Actual results:
The page will redirect to Operator installation page with the error message "A subscription for this Operator already exists in Namespace "XXX" "

Expected results:
The page should stay on the install page, with the message "Installed operator- ready for use"

Additional info:

Please find the attached snap for more details

https://github.com/openshift/console/pull/12704

Bug OCPBUGS-12196: Update CVO to stable-4.14

View the Description View the linked PRs

Bugs are required for all 4.14 merges right now due to instability. We need to bump the version of the cvo so that the version is consistent with the cluster being installed.

https://github.com/openshift/installer/pull/7114

Bug OCPBUGS-17315: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver/pull/85

Bug OCPBUGS-19805: CoreDNS panics if an EndpointSlice object contains a port without a port number

View the Description View the linked PRs

Description of problem:

While reviewing PRs in CoreDNS 1.11.0, we stumbled upon https://github.com/coredns/coredns/pull/6179, which describes an CoreDNS crash in the kubernetes plugin if you create an EndpointSlice object contains a port without a port number.

I reproduced this myself and was able to successfully bring down all of CoreDNS so that the cluster was put into a degraded state.

We've bumped to CoreDNS 1.11.1 in 4.15, so this is concern for < 4.15.

Version-Release number of selected component (if applicable):

Less than or equal to 4.14

How reproducible:

100%

Steps to Reproduce:

1. Create an endpointslice with a port with no port number:

apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
  name: example-abc
addressType: IPv4
ports:
  - name: ""

2.Shortly after creating this object, all DNS pods continuously crash:
oc get -n openshift-dns pods
NAME                  READY   STATUS             RESTARTS     AGE
dns-default-57lmh     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-h6cvm     1/2     CrashLoopBackOff   1 (4s ago)   79m
dns-default-mn7qd     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-mxq5g     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-wdrff     1/2     CrashLoopBackOff   1 (3s ago)   79m
dns-default-zs7cd     1/2     CrashLoopBackOff   1 (3s ago)   79m

Actual results:

DNS Pods crash

Expected results:

DNS Pods should NOT crash

Additional info:

https://github.com/openshift/coredns/pull/96

Bug OCPBUGS-20717: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-8073: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2261

Bug OCPBUGS-23558: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3221

Bug OCPBUGS-7465: oc-mirror will hit 401 code after hang a while

View the Description View the linked PRs

Description of problem:

When use the command `oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9` , at begging the response code is 200 okay , when the command will hang for a while , then will got response code 401.

Version-Release number of selected component (if applicable):

How reproducible:

sometimes

Steps to Reproduce:

Using the advanced cluster management package as an example.

1. oc-mirror list operators --catalog=registry.redhat.io/redhat/certified-operator-index:v4.12 -v 9

Actual results: After hang a while , will got 401 code , seems when timeout the oc-mirror try again forgot to read the credentials

level=debug msg=fetch response received digest=sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=714959 response.header.connection=keep-alive response.header.content-length=80847073 response.header.content-type=binary/octet-stream response.header.date=Mon, 06 Feb 2023 06:52:06 GMT response.header.etag="a428fafd37ee58f4bdeae1a7ff7235b5-1" response.header.last-modified=Fri, 16 Sep 2022 17:54:09 GMT response.header.server=AmazonS3 response.header.via=1.1 010c0731b9775a983eceaec0f5fa6a2e.cloudfront.net (CloudFront) response.header.x-amz-cf-id=rEfKWnJdasWIKnjWhYyqFn9eHY8v_3Y9WwSRnnkMTkPayHlBxWX1EQ== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=GfqTTjWbdqB0sreyjv3fyo1k6LQ9kZKC response.header.x-cache=Hit from cloudfront response.status=200 OK size=80847073 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:a67257cfe913ad09242bf98c44f2330ec7e8261ca3a8db3431cb88158c3d4837
level=debug msg=fetch response received digest=sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=595868 response.header.connection=keep-alive response.header.content-length=98028196 response.header.content-type=binary/octet-stream response.header.date=Tue, 07 Feb 2023 15:56:56 GMT response.header.etag="f702c84459b479088565e4048a890617-1" response.header.last-modified=Wed, 18 Jan 2023 06:55:12 GMT response.header.server=AmazonS3 response.header.via=1.1 7f5e0d3b9ea85d0d75063a66c0ebc840.cloudfront.net (CloudFront) response.header.x-amz-cf-id=Tw9cjJjYCy8idBiQ1PvljDkhAoEDEzuDCNnX6xJub4hGeh8V0CIP_A== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=nt7yY.YmjWF0pfAhzh_fH2xI_563GnPz response.header.x-cache=Hit from cloudfront response.status=200 OK size=98028196 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:d242c7b4380d3c9db3ac75680c35f5c23639a388ad9313f263d13af39a9c8b8b
level=debug msg=fetch response received digest=sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18 mediatype=application/vnd.docker.container.image.v1+json response.header.accept-ranges=bytes response.header.age=17430 response.header.connection=keep-alive response.header.content-length=24828 response.header.content-type=binary/octet-stream response.header.date=Tue, 14 Feb 2023 08:37:35 GMT response.header.etag="57eb6fdca8ce82a837bdc2cebadc3c7b-1" response.header.last-modified=Mon, 13 Feb 2023 16:11:57 GMT response.header.server=AmazonS3 response.header.via=1.1 0c96ded7ff282d2dbcf47c918b6bb500.cloudfront.net (CloudFront) response.header.x-amz-cf-id=w9zLDWvPJ__xbTpI8ba5r9DRsFXbvZ9rSx5iksG7lFAjWIthuokOsA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-version-id=Enw8mLebn4.ShSajtLqdo4riTDHnVEFZ response.header.x-cache=Hit from cloudfront response.status=200 OK size=24828 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:664a8226a152ea0f1078a417f2ec72d3a8f9971e8a374859b486b60049af9f18
level=debug msg=fetch response received digest=sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.accept-ranges=bytes response.header.age=829779 response.header.connection=keep-alive response.header.content-length=26039246 response.header.content-type=binary/octet-stream response.header.date=Sat, 04 Feb 2023 22:58:25 GMT response.header.etag="a08688b701b31515c6861c69e4d87ebd-1" response.header.last-modified=Tue, 06 Dec 2022 20:50:51 GMT response.header.server=AmazonS3 response.header.via=1.1 000f4a2f631bace380a0afa747a82482.cloudfront.net (CloudFront) response.header.x-amz-cf-id=S-h31zheAEOhOs6uH52Rpq0ZnoRRdd5VfaqVbZWXzAX-Zym-0XtuKA== response.header.x-amz-cf-pop=HIO50-C1 response.header.x-amz-replication-status=COMPLETED response.header.x-amz-server-side-encryption=AES256 response.header.x-amz-storage-class=INTELLIGENT_TIERING response.header.x-amz-version-id=BQOjon.COXTTON_j20wZbWWoDEmGy1__ response.header.x-cache=Hit from cloudfront response.status=200 OK size=26039246 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:130c9d0ca92e54f59b68c4debc5b463674ff9555be1f319f81ca2f23e22de16f




level=debug msg=do request digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip request.header.accept=application/vnd.docker.image.rootfs.diff.tar.gzip, */* request.header.range=bytes=13417268- request.header.user-agent=opm/alpha request.method=GET size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9
level=debug msg=fetch response received digest=sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9 mediatype=application/vnd.docker.image.rootfs.diff.tar.gzip response.header.cache-control=max-age=0, no-cache, no-store response.header.connection=keep-alive response.header.content-length=99 response.header.content-type=application/json response.header.date=Tue, 14 Feb 2023 13:34:06 GMT response.header.docker-distribution-api-version=registry/2.0 response.header.expires=Tue, 14 Feb 2023 13:34:06 GMT response.header.pragma=no-cache response.header.registry-proxy-request-id=0d7ea55f-e96d-4311-885a-125b32c8e965 response.header.www-authenticate=Bearer realm="https://registry.redhat.io/auth/realms/rhcc/protocol/redhat-docker-v2/auth",service="docker-registry",scope="repository:redhat/certified-operator-index:pull" response.status=401 Unauthorized size=91700480 url=https://registry.redhat.io/v2/redhat/certified-operator-index/blobs/sha256:db8e9d2f583af66157f383f9ec3628b05fa0adb0d837269bc9f89332c65939b9.

Expected results:

Should always read the credentials for the command .

https://github.com/openshift/oc-mirror/pull/678

Bug OCPBUGS-15210: Could not update rolebinding "openshift-monitoring/cluster-monitoring-operator-techpreview-only"

View the Description View the linked PRs

Description of problem:

The upgrade to 4.14.0-ec.2 from 4.14.0-ec.1 was blocked by the error message on the UI:

Could not update rolebinding "openshift-monitoring/cluster-monitoring-operator-techpreview-only" (531 of 993): the object is invalid, possibly due to local cluster configuration

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Unblocked by 

oc --context build02 delete rolebinding cluster-monitoring-operator-techpreview-only -n openshift-monitoring --as system:admin
rolebinding.rbac.authorization.k8s.io "cluster-monitoring-operator-techpreview-only" deleted

https://github.com/openshift/cluster-monitoring-operator/pull/2008

Bug OCPBUGS-20357: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-18181~~.

https://github.com/openshift/installer/pull/7618

Bug OCPBUGS-13699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-image/pull/371

Bug OCPBUGS-16227: SSH keys not configured on baremetal worker nodes

View the Description View the linked PRs

Description of problem:

SSH keys not configured on the worker nodes

Version-Release number of selected component (if applicable):

4.14.0-0.ci-2023-07-14-014011

How reproducible:

so far 100%

Steps to Reproduce:

1. Deploy baremetal cluster using IPI flow
2.
3.

Actual results:

Deployment succeeds but SSH keys not configured on the worker nodes

Expected results:

SSH keys configured on the worker nodes

Additional info:

SSH keys configured on the control-plane nodes

ssh core@master-0-0 'cat .ssh/authorized_keys.d/ignition'
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDm9hb6iTZJypEmzg4IZ767ze60UGhBWnjPXhovWVB7uKputdLzZhmlo36ifkXr/DTk8NGm47r6kXmz9NAF0pDHa5jX6yJFnhS4z5NY/mzsUX41gwiqBKYHgdp/KE1ylE8mbNon5ZpaaGvb876myjjPjPwWsD8hvXZirA5Q8TfDb/Pvgy1dhVH/uN05Ip1vVsp+bFGMPUJVWVUy/Eby5xW6OJv+FBOQq4nu6tslDZlHYXX2TSGrlW4x0i/oQMpKu/Y8ygAdjWqmAy6UBcho1nNWy15cp0jI5Fhjze171vSWZLAqJY+eFcL2kt/09RnY+MXyY/tIf+qNMyBE2Qltigah

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2023-07-14T12:13:00Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: 99-worker-ssh
  resourceVersion: "2242"
  uid: 0ef02005-509e-4fc9-91ee-fc0afe27d5e6
spec:
  config:
    ignition:
      version: 3.2.0
    passwd:
      users:
      - name: core
        sshAuthorizedKeys:
        - |
          ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDm9hb6iTZJypEmzg4IZ767ze60UGhBWnjPXhovWVB7uKputdLzZhmlo36ifkXr/DTk8NGm47r6kXmz9NAF0pDHa5jX6yJFnhS4z5NY/mzsUX41gwiqBKYHgdp/KE1ylE8mbNon5ZpaaGvb876myjjPjPwWsD8hvXZirA5Q8TfDb/Pvgy1dhVH/uN05Ip1vVsp+bFGMPUJVWVUy/Eby5xW6OJv+FBOQq4nu6tslDZlHYXX2TSGrlW4x0i/oQMpKu/Y8ygAdjWqmAy6UBcho1nNWy15cp0jI5Fhjze171vSWZLAqJY+eFcL2kt/09RnY+MXyY/tIf+qNMyBE2Qltigah
  extensions: null
  fips: false
  kernelArguments: null
  kernelType: ""
  osImageURL: ""

Bug OCPBUGS-11393: API usage document for route.spec.tls.insecureEdgeTerminationPolicy shows incorrect values in RHOCP 4

View the Description View the linked PRs

Description of problem:

Command `$ oc explain route.spec.tls.insecureEdgeTerminationPolicy` shows different values than the actual values.

Version-Release number of selected component (if applicable):

4.10.z

How reproducible:

100%

Steps to Reproduce:

1. $ oc explain route.spec.tls.insecureEdgeTerminationPolicy
KIND:     Route
VERSION:  route.openshift.io/v1FIELD:    insecureEdgeTerminationPolicy <string>DESCRIPTION:
     insecureEdgeTerminationPolicy indicates the desired behavior for insecure
     connections to a route. While each router may make its own decisions on
     which ports to expose, this is normally port 80.     
    
     * Allow - traffic is sent to the server on the insecure port (default)
     * Disable - no traffic is allowed on the insecure port.
     * Redirect - clients are redirected to the secure port.

2. Set the option to 'Disable' in any secure route :
   $ oc edit route <route-name>
     spec:
       host: hello.example.com
       port:
         targetPort: https
       tls:
         insecureEdgeTerminationPolicy: Disable

3. After editing the route and setting `insecureEdgeTerminationPolicy: Disable` , it gives error :
Danger alert:An error occurred
Error "Invalid value: "Disable": invalid value for InsecureEdgeTerminationPolicy option, acceptable values are None, Allow, Redirect, or empty" for field "spec.tls.insecureEdgeTerminationPolicy".

Actual results:

Based on the API Usage information, the Disable value for insecureEdgeTerminationPolicy field is not acceptable.

Expected results:

The `oc explain route.spec.tls.insecureEdgeTerminationPolicy` must show the correct values.

Additional info:

https://github.com/openshift/openshift-apiserver/pull/368

Bug OCPBUGS-12794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/514

Bug OCPBUGS-13636: AWS: Govcloud: add new SC2S (us-isob-east-1) and TC2S regions (us-iso-west-1)

View the Description View the linked PRs

Description of problem:

PRs were previously merged to add SC2S support via AWS SDK here:

https://github.com/openshift/installer/pull/5710
https://github.com/openshift/installer/pull/5597
https://github.com/openshift/cluster-ingress-operator/pull/703

However, further updates to add support for SC2S region (us-isob-east-1) and new TC2S region (us-iso-west-1) are still required.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Try to deploy a cluster on us-isob-east-1 or us-iso-west-1
2.
3.

Actual results:

Regions are not supported

Expected results:

Additional info:

Both TC2S and SC2S support ALIAS records now.

https://github.com/openshift/installer/pull/6184

Bug OCPBUGS-14907: Operator Backed catalog doesn't show anything when CSV copies are disabled

View the Description View the linked PRs

Description of problem:

We disabled copies of CSVs in our clusters, the list of the installed operators is visible, but when we go (within the context of some user namespace) to:
Developer Catalog -> Operator Backed
then the list is empty.

When we enable the copies of CSVs, then the operator backed catalog shows the expected items.

Version-Release number of selected component (if applicable):

OpenShift 4.13.1

How reproducible:

every time

Steps to Reproduce:

1. install Camel-k operator (community version, stable channel)
2. Disable copies of CSV by setting 'OLMConfig.spec.features.disableCopiedCSVs' to 'true'
3. create a new namespace/project
4. go to Developer Catalog -> Operator backed

Actual results:

the Operator Backed Catalog is empty

Expected results:

the Operator Backed Catalog should show Camel-K related items

Additional info:

https://github.com/openshift/console/pull/12932

Story HOSTEDCP-937: Send metric with HO version

View the Description View the linked PRs

DoD:

Let the HO export a metric with it own version so as an SRE I can easily understand which version is running where by looking at a grafana dashboard.

https://github.com/openshift/hypershift/pull/2443

Bug OCPBUGS-13533: Sync ironic-image with upstream metal3

View the Description View the linked PRs

Regular sync with upstream source on metal3

https://github.com/openshift/ironic-image/pull/368

Bug OCPBUGS-13922: Cluster upgrade failed waiting on network

View the Description View the linked PRs

Description of problem:

Cluster upgrade failure has been affecting three consecutive nightly payloads. 

https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-20-041508
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-21-120836
https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-05-22-035713

In all three cases, upgrade seems to fail waiting on network. Take this job as an example:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624

Cluster version operator complains about network operator has not finished upgrade:

I0522 07:12:58.540244       1 sync_worker.go:1149] Update error 684 of 845: ClusterOperatorUpdating Cluster operator network is updating versions (*errors.errorString: cluster operator network is available and not degraded but has not finished updating to target version)

This log can been seen in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/pods/openshift-cluster-version_cluster-version-operator-5565f87cc6-6sjqf_cluster-version-operator.log

The network operator keeps waiting with the following log:
I0522 07:12:58.563312       1 connectivity_check_controller.go:166] ConnectivityCheckController is waiting for transition to desired version (4.14.0-0.nightly-2023-05-22-035713) to be completed.

This lasted over 2 hours. The log can be seen in https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-sdn-upgrade/1660495736527130624/artifacts/e2e-aws-sdn-upgrade/gather-extra/artifacts/pods/openshift-network-operator_network-operator-6975b7b8ff-pdxzk_network-operator.log

Compared with a working job, there seems to be an error getting *v1alpha1.PodNetworkConnectivityCheck in the openshift-network-diagnostics_network-check-source:
W0522 04:34:18.527315       1 reflector.go:424] k8s.io/client-go@v12.0.0+incompatible/tools/cache/reflector.go:169: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)
E0522 04:34:18.527391       1 reflector.go:140] k8s.io/client-go@v12.0.0+incompatible/tools/cache/reflector.go:169: Failed to watch *v1alpha1.PodNetworkConnectivityCheck: failed to list *v1alpha1.PodNetworkConnectivityCheck: the server could not find the requested resource (get podnetworkconnectivitychecks.controlplane.operator.openshift.io)

It is not clear whether this is really relevant. Also worth mentioning is that, every time when this problem happens, machine-config and dns also stuck with the older version. 

This has been affecting 4.14 nightly payload three times. If it shows more consistency, we might have to increase the severity of the bug. Please ping TRT if any more info is needed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1818

Bug OCPBUGS-14869: Add timezone info in installer logs

View the Description View the linked PRs

Description of problem:

No timezone info in installer logs

Version-Release number of selected component (if applicable):

4.x

How reproducible:

100%

Steps to Reproduce:

1. openshift-install wait-for install-complete --dir=./foo
2.
3.

Actual results:

INFO Waiting up to 1h0m0s (until 4:52PM) for the cluster at https://api.ocp.example.local:6443 to initialize...

Expected results:

INFO Waiting up to 1h0m0s (until 4:52PM UTC) for the cluster at https://api.ocp.example.local:6443 to initialize...

Additional info:

https://github.com/openshift/installer/pull/7243

Bug OCPBUGS-24667: Console plugin requests show error message with 304 status and "request method or response status code does not allow body"

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19875~~. The following is the description of the original issue:
—
This issue has been updated to capture a larger ongoing issue around console 304 status responses for plugins. This has been observed for ODF, ACM, MCE, monitoring, and other plugins going back to 4.12. Related links:

Original report from this bug:

Description of problem:

find error logs under console pod logs

Version-Release number of selected component (if applicable):

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2023-09-27-073353   True        False         37m     Cluster version is 4.15.0-0.nightly-2023-09-27-073353

How reproducible:

100% on ipv6 clusters

Steps to Reproduce:

1.% oc -n openshift-console logs console-6fbf69cc49-7jq5b
...
E0928 00:35:24.098808       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:24.098822       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:39.611569       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:39.611583       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body
E0928 00:35:54.442150       1 handlers.go:172] GET request for "monitoring-plugin" plugin failed with 304 status code
E0928 00:35:54.442167       1 utils.go:43] Failed sending HTTP response body: http: request method or response status code does not allow body

Actual results:

GET request for "monitoring-plugin" plugin failed with 304 status code

Expected results:

no monitoring-plugin related error logs

https://github.com/openshift/console/pull/13425

Bug OCPBUGS-19860: Multus annotation permissions: Certificate duration should be configurable [backport 4.14]

View the Description View the linked PRs

Description of problem: the per-node certificates should be a configurable duration

https://github.com/openshift/multus-cni/pull/192

Bug OCPBUGS-8676: For IPv6-primary dual-stack cluster, kubelet.service renders only single node-ip

View the Description View the linked PRs

When implementing support for IPv6-primary dual-stack clusters, we have extended the available IP families to

const (
	IPFamiliesIPv4                 IPFamiliesType = "IPv4"
	IPFamiliesIPv6                 IPFamiliesType = "IPv6"
	IPFamiliesDualStack            IPFamiliesType = "DualStack"
	IPFamiliesDualStackIPv6Primary IPFamiliesType = "DualStackIPv6Primary"
)

At the same time definitions of kubelet.service systemd unit still contain the code

{{- if eq .IPFamilies "DualStack"}}
        --node-ip=${KUBELET_NODE_IPS} \
{{- else}}
        --node-ip=${KUBELET_NODE_IP} \
{{- end}}

which only matches the "old" dual-stack family. Because of this, an IPv6-primary dual-stack renders node-ip param with only 1 IP address instead of 2 as required in dual-stack.

https://github.com/openshift/machine-config-operator/pull/3592

Bug OCPBUGS-9969: 4.1 born cluster fails to scale-up due to podman run missing `--authfile` flag

View the Description View the linked PRs

Description of problem:

OCP cluster born on 4.1 fails to scale-up node due to older podman version 1.0.2 present in 4.1 bootimage. This was observed while testing bug https://issues.redhat.com/browse/OCPBUGS-7559?focusedCommentId=21889975&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21889975

Journal log:
- Unit machine-config-daemon-update-rpmostree-via-container.service has finished starting up.
--
-- The start-up result is RESULT.
Mar 10 10:41:29 ip-10-0-218-217 podman[18103]: flag provided but not defined: -authfile
Mar 10 10:41:29 ip-10-0-218-217 podman[18103]: See 'podman run --help'.
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Main process exited, code=exited, status=125/n/a
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Failed with result 'exit-code'.
Mar 10 10:41:29 ip-10-0-218-217 systemd[1]: machine-config-daemon-update-rpmostree-via-container.service: Consumed 24ms CPU time

Version-Release number of selected component (if applicable):

OCP 4.12 and later

Steps to Reproduce:

1.Upgrade a 4.1 based cluster to 4.12 or later version
2. Try to Scale up node
3. Node will fail to join

Additional info: https://issues.redhat.com/browse/OCPBUGS-7559?focusedCommentId=21890647&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21890647

https://github.com/openshift/machine-config-operator/pull/3611

Bug OCPBUGS-12644: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/538

Bug OCPBUGS-17787: nodeip-configuration.service fails to enable forwarding - No such file or directory

View the Description View the linked PRs

Description of problem:

nodeip-configuration.service is failed on cluster nodes:

systemctl status nodeip-configuration.service
× nodeip-configuration.service - Writes IP address configuration so that kubelet and crio services select a valid node IP
     Loaded: loaded (/etc/systemd/system/nodeip-configuration.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Tue 2023-08-15 16:28:09 UTC; 18h ago
   Main PID: 3709 (code=exited, status=0/SUCCESS)
        CPU: 237ms

Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3761]: ++ [[ -z bond0.354 ]]
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3761]: ++ echo bond0.354
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + iface=bond0.354
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + echo 'Node IP interface determined as: bond0.354. Enabling IP forwarding...'
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: Node IP interface determined as: bond0.354. Enabling IP forwarding...
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3760]: + sysctl -w net.ipv4.conf.bond0.354.forwarding=1
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com configure-ip-forwarding.sh[3767]: sysctl: cannot stat /proc/sys/net/ipv4/conf/bond0/354/forwarding: No such file or directory
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: nodeip-configuration.service: Control process exited, code=exited, status=1/FAILURE
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: nodeip-configuration.service: Failed with result 'exit-code'.
Aug 15 16:28:09 openshift-worker-2.lab.eng.tlv2.redhat.com systemd[1]: Failed to start Writes IP address configuration so that kubelet and crio services select a valid node IP.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-08-005757

How reproducible:

so far once

Steps to Reproduce:

1. Deploy multinode spoke cluster with GitOps-ZTP
2. Configure baremetal network to be on top of vlan interface

              - name: bond0.354
                description: baremetal network
                type: vlan
                state: up
                vlan:
                  base-iface: bond0
                  id: 354
                ipv4:
                  enabled: true
                  dhcp: false
                  address:
                  - ip: 10.x.x.20
                    prefix-length: 26
                ipv6:
                  enabled: false
                  dhcp: false
                  autoconf: false

Actual results:

Cluster is deployed but nodeip-configuration.service is Failed

Expected results:

nodeip-configuration.service is Active

https://github.com/openshift/machine-config-operator/pull/3870

Bug OCPBUGS-27362: Environment file /etc/kubernetes/node.env is overwritten after a node restart

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-27307~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-27261. The following is the description of the original issue:
—
Description of problem:

    Environment file /etc/kubernetes/node.env is overwritten after node restart. 

There is a type in https://github.com/openshift/machine-config-operator/blob/master/templates/common/aws/files/usr-local-bin-aws-kubelet-nodename.yaml where variable should be changed to NODEENV wherever NODENV is found.

Version-Release number of selected component (if applicable):

How reproducible:

  Easy

Steps to Reproduce:

    1. Change contents of /etc/kubernetes/node.env
    2. Restart node
    3. Notice changes are lost

Actual results:

Expected results:

     /etc/kubernetes/node.env should not be changed after restart of a node

Additional info:

https://github.com/openshift/machine-config-operator/pull/4131

Bug OCPBUGS-10673: [alibabacloud] IPI install got bootstrap failure and without any node ready, due to enforced EIP bandwidth 5 Mbit/s

View the Description View the linked PRs

Description of problem:

The IPI installation in some regions got bootstrap failure, and without any node available/ready.

Version-Release number of selected component (if applicable):

12-22 16:22:27.970  ./openshift-install 4.12.0-0.nightly-2022-12-21-202045
12-22 16:22:27.970  built from commit 3f9c38a5717c638f952df82349c45c7d6964fcd9
12-22 16:22:27.970  release image registry.ci.openshift.org/ocp/release@sha256:2d910488f25e2638b6d61cda2fb2ca5de06eee5882c0b77e6ed08aa7fe680270
12-22 16:22:27.971  release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. try the IPI installation in the problem regions (so far tried and failed with ap-southeast-2, ap-south-1, eu-west-1, ap-southeast-6, ap-southeast-3, ap-southeast-5, eu-central-1, cn-shanghai, cn-hangzhou and cn-beijing)

Actual results:

Bootstrap failed to complete

Expected results:

Installation in those regions should succeed.

Additional info:

FYI the QE flexy-install job: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/166672/

No any node available/ready, and no any operator available.
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          30m     Unable to apply 4.12.0-0.nightly-2022-12-21-202045: an unknown error has occurred: MultipleErrors
$ oc get nodes
No resources found
$ oc get machines -n openshift-machine-api -o wide
NAME                         PHASE   TYPE   REGION   ZONE   AGE   NODE   PROVIDERID   STATE
jiwei-1222f-v729x-master-0                                  30m                       
jiwei-1222f-v729x-master-1                                  30m                       
jiwei-1222f-v729x-master-2                                  30m                       
$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication
baremetal
cloud-controller-manager                                                                          
cloud-credential                                                                                  
cluster-autoscaler                                                                                
config-operator                                                                                   
console                                                                                           
control-plane-machine-set                                                                         
csi-snapshot-controller                                                                           
dns                                                                                               
etcd                                                                                              
image-registry                                                                                    
ingress                                                                                           
insights                                                                                          
kube-apiserver                                                                                    
kube-controller-manager                                                                           
kube-scheduler                                                                                    
kube-storage-version-migrator                                                                     
machine-api                                                                                       
machine-approver                                                                                  
machine-config                                                                                    
marketplace                                                                                       
monitoring                                                                                        
network                                                                                           
node-tuning                                                                                       
openshift-apiserver                                                                               
openshift-controller-manager                                                                      
openshift-samples                                                                                 
operator-lifecycle-manager                                                                        
operator-lifecycle-manager-catalog                                                                
operator-lifecycle-manager-packageserver
service-ca
storage
$

Mater nodes don't run for example kubelet and crio services.
[core@jiwei-1222f-v729x-master-0 ~]$ sudo crictl ps
FATA[0000] unable to determine runtime API version: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/run/crio/crio.sock: connect: no such file or directory" 
[core@jiwei-1222f-v729x-master-0 ~]$ 

The machine-config-daemon firstboot tells "failed to update OS".
[jiwei@jiwei log-bundle-20221222085846]$ grep -Ei 'error|failed' control-plane/10.0.187.123/journals/journal.log 
Dec 22 16:24:16 localhost kernel: GPT: Use GNU Parted to correct GPT errors.
Dec 22 16:24:16 localhost kernel: GPT: Use GNU Parted to correct GPT errors.
Dec 22 16:24:18 localhost ignition[867]: failed to fetch config: resource requires networking
Dec 22 16:24:18 localhost ignition[891]: GET error: Get "http://100.100.100.200/latest/user-data": dial tcp 100.100.100.200:80: connect: network is unreachable
Dec 22 16:24:18 localhost ignition[891]: GET error: Get "http://100.100.100.200/latest/user-data": dial tcp 100.100.100.200:80: connect: network is unreachable
Dec 22 16:24:19 localhost.localdomain NetworkManager[919]: <info>  [1671726259.0329] hostname: hostname: hostnamed not used as proxy creation failed with: Could not connect: No such file or directory
Dec 22 16:24:19 localhost.localdomain NetworkManager[919]: <warn>  [1671726259.0464] sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
Dec 22 16:24:19 localhost.localdomain ignition[891]: GET error: Get "https://api-int.jiwei-1222f.alicloud-qe.devcluster.openshift.com:22623/config/master": dial tcp 10.0.187.120:22623: connect: connection refused
...repeated logs omitted...
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 ovs-ctl[1888]: 2022-12-22T16:27:46Z|00001|dns_resolve|WARN|Failed to read /etc/resolv.conf: No such file or directory
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 ovs-vswitchd[1888]: ovs|00001|dns_resolve|WARN|Failed to read /etc/resolv.conf: No such file or directory
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 dbus-daemon[1669]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.resolve1.service': Unit dbus-org.freedesktop.resolve1.service not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[1924]: Error: Device '' not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[1937]: Error: Device '' not found.
Dec 22 16:27:46 jiwei-1222f-v729x-master-0 nm-dispatcher[2037]: Error: Device '' not found.
Dec 22 08:35:32 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: Warning: failed, retrying in 1s ... (1/2)I1222 08:35:32.477770    2181 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-extensions/os-extensions-content-910221290 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:259d8c6b9ec714d53f0275db9f2962769f703d4d395afb9d902e22cfe96021b0
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 rpm-ostree[2288]: Txn Rebase on /org/projectatomic/rpmostree1/rhcos failed: remote error: Get "https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:27f262e70d98996165748f4ab50248671d4a4f97eb67465cd46e1de2d6bd24d0": net/http: TLS handshake timeout
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: W1222 08:56:06.785425    2181 firstboot_complete_machineconfig.go:46] error: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:411e6e3be017538859cfbd7b5cd57fc87e5fee58f15df19ed3ec11044ebca511 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:411e6e3be017538859cfbd7b5cd57fc87e5fee58f15df19ed3ec11044ebca511: Warning: The unit file, source configuration file or drop-ins of rpm-ostreed.service changed on disk. Run 'systemctl daemon-reload' to reload units.
Dec 22 08:56:06 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: error: remote error: Get "https://quay.io/v2/openshift-release-dev/ocp-v4.0-art-dev/blobs/sha256:27f262e70d98996165748f4ab50248671d4a4f97eb67465cd46e1de2d6bd24d0": net/http: TLS handshake timeout
Dec 22 08:57:31 jiwei-1222f-v729x-master-0 machine-config-daemon[2181]: Warning: failed, retrying in 1s ... (1/2)I1222 08:57:31.244684    2181 run.go:19] Running: nice -- ionice -c 3 oc image extract --path /:/run/mco-extensions/os-extensions-content-4021566291 --registry-config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:259d8c6b9ec714d53f0275db9f2962769f703d4d395afb9d902e22cfe96021b0
Dec 22 08:59:20 jiwei-1222f-v729x-master-0 systemd[2353]: /usr/lib/systemd/user/podman-kube@.service:10: Failed to parse service restart specifier, ignoring: never
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 podman[2437]: Error: open default: no such file or directory
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 podman[2450]: Error: failed to start API service: accept unixgram @00026: accept4: operation not supported
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: podman-kube@default.service: Failed with result 'exit-code'.
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: Failed to start A template for running K8s workloads via podman-play-kube.
Dec 22 08:59:21 jiwei-1222f-v729x-master-0 systemd[2353]: podman.service: Failed with result 'exit-code'.
[jiwei@jiwei log-bundle-20221222085846]$

https://github.com/openshift/installer/pull/7011

Bug OCPBUGS-11894: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2475

Bug OCPBUGS-17436: DynamicResourceAllocation feature breaks TechPreview clusters

View the Description View the linked PRs

Description of problem:

When installing a new cluster with TechPreviewNoUpgrade featureSet, Nodes never become Ready.

Logs from control-plane components indicate that a resource associated with the DynamicResourceAllocation feature can't be found:

E0804 15:48:51.094383       1 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1alpha2.PodSchedulingContext: failed to list *v1alpha2.PodSchedulingContext: the server could not find the requested resource (get podschedulingcontexts.resource.k8s.io)

It turns out we either need to:

1. Enable the resource.k8s.io/v1alpha2=true API in kube-apiserver.
2. Or disable the DynamicResourceAllocation feature as TP.

For now I added a commit to invalidate this feature in o/k and disable all related tests. Please let me know once this is sorted out so that I can drop that commit from the rebase PR.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always when installing a new cluster with TechPreviewNoUpgrade featureSet.

Steps to Reproduce:

1. Install cluster with TechPreviewNoUpgrade featureSet (this can be done passing an install-config.yaml to the installer).
2. Check logs from one the control-plane components.

Actual results:

Nodes are NotReady and ClusterOperators Degraded.

Expected results:

Cluster is installed successfully.

Additional info:

Slack thread: https://redhat-internal.slack.com/archives/C05HQGU8TFF/p1691154653507499

How to enable an API in KAS: https://kubernetes.io/docs/tasks/administer-cluster/enable-disable-api/

Bug OCPBUGS-20773: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/80

Bug OCPBUGS-22477: [4.14] Remove z-upgrades from UpgradeBackupController

View the Description View the linked PRs

Description of problem:

Continuous triggers on the UpgradeBackupController on every z-upgrades caused out of space issues for some customers. 

We decided in a meeting to remove the (undocumented) z-stream upgrade triggers. 
In addition we're going to add a retention mechanism by deleting all files and folders in /etc/kubernetes/cluster-backup before taking a y-stream upgrade backup.

Version-Release number of selected component (if applicable):

4.10 -> 4.14

How reproducible:

always

Steps to Reproduce / expected results:

* do a z-upgrade
* observe that no backup should be taken anymore

* do a y-upgrade
* observe that a backup is still taken
* note that any previous backups should not exist anymore in /etc/kubernetes/cluster-backup

Additional info:

https://github.com/openshift/cluster-etcd-operator/pull/1140

Bug OCPBUGS-8540: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6950

Bug OCPBUGS-18728: Fail to install with Kuryr due to issue when validating certificate for the API

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18517~~. The following is the description of the original issue:
—
Description of problem:

Installation with Kuryr is failing because multiple components are attempting to connect to the API and fail with the following error:

failed checking apiserver connectivity: Get "https://172.30.0.1:443/apis/coordination.k8s.io/v1/namespaces/openshift-service-ca/leases/service-ca-controller-lock": tls: failed to verify certificate: x509: cannot validate certificate for 172.30.0.1 because it doesn't contain any IP SANs

$ oc get po -A -o wide |grep -v Running |grep -v Pending |grep -v Completed
NAMESPACE                                          NAME                                                        READY   STATUS             RESTARTS          AGE     IP              NODE                   NOMINATED NODE   READINESS GATES
openshift-apiserver-operator                       openshift-apiserver-operator-559d855c56-c2rdr               0/1     CrashLoopBackOff   42 (2m28s ago)    3h44m   10.128.16.86    kuryr-5sxhw-master-2   <none>           <none>
openshift-apiserver                                apiserver-6b9f5d48c4-bj6s6                                  0/2     CrashLoopBackOff   92 (4m25s ago)    3h36m   10.128.70.10    kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-csi-drivers                      manila-csi-driver-operator-75b64d8797-fckf5                 0/1     CrashLoopBackOff   42 (119s ago)     3h41m   10.128.56.21    kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-csi-drivers                      openstack-cinder-csi-driver-operator-84dfd8d89f-kgtr8       0/1     CrashLoopBackOff   42 (82s ago)      3h41m   10.128.56.9     kuryr-5sxhw-master-0   <none>           <none>
openshift-cluster-node-tuning-operator             cluster-node-tuning-operator-7fbb66545c-kh6th               0/1     CrashLoopBackOff   46 (3m5s ago)     3h44m   10.128.6.40     kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 cluster-storage-operator-5545dfcf6d-n497j                   0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.21.175   kuryr-5sxhw-master-2   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-ddb9469f9-bc4bb                     0/1     CrashLoopBackOff   45 (2m17s ago)    3h41m   10.128.20.106   kuryr-5sxhw-master-1   <none>           <none>
openshift-cluster-storage-operator                 csi-snapshot-controller-operator-6d7b66dbdd-xdwcs           0/1     CrashLoopBackOff   42 (92s ago)      3h44m   10.128.21.220   kuryr-5sxhw-master-2   <none>           <none>
openshift-config-operator                          openshift-config-operator-c5d5d964-2w2bv                    0/1     CrashLoopBackOff   80 (3m39s ago)    3h44m   10.128.43.39    kuryr-5sxhw-master-2   <none>           <none>
openshift-controller-manager-operator              openshift-controller-manager-operator-754d748cf7-rzq6f      0/1     CrashLoopBackOff   42 (3m6s ago)     3h44m   10.128.25.166   kuryr-5sxhw-master-2   <none>           <none>
openshift-etcd-operator                            etcd-operator-76ddc94887-zqkn7                              0/1     CrashLoopBackOff   49 (30s ago)      3h44m   10.128.32.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-ingress-operator                         ingress-operator-9f76cf75b-cjx9t                            1/2     CrashLoopBackOff   39 (3m24s ago)    3h44m   10.128.9.108    kuryr-5sxhw-master-2   <none>           <none>
openshift-insights                                 insights-operator-776cd7cfb4-8gzz7                          0/1     CrashLoopBackOff   46 (4m21s ago)    3h44m   10.128.15.102   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver-operator                  kube-apiserver-operator-64f4db777f-7n9jv                    0/1     CrashLoopBackOff   42 (113s ago)     3h44m   10.128.18.199   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-apiserver                           installer-5-kuryr-5sxhw-master-1                            0/1     Error              0                 3h35m   10.128.68.176   kuryr-5sxhw-master-1   <none>           <none>
openshift-kube-controller-manager-operator         kube-controller-manager-operator-746497b-dfbh5              0/1     CrashLoopBackOff   42 (2m23s ago)    3h44m   10.128.13.162   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-controller-manager                  installer-4-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.65.186   kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-scheduler-operator                  openshift-kube-scheduler-operator-695fb4449f-j9wqx          0/1     CrashLoopBackOff   42 (63s ago)      3h44m   10.128.44.194   kuryr-5sxhw-master-2   <none>           <none>
openshift-kube-scheduler                           installer-5-kuryr-5sxhw-master-0                            0/1     Error              0                 3h35m   10.128.60.44    kuryr-5sxhw-master-0   <none>           <none>
openshift-kube-storage-version-migrator-operator   kube-storage-version-migrator-operator-6c5cd46578-qpk5z     0/1     CrashLoopBackOff   42 (2m18s ago)    3h44m   10.128.4.120    kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              cluster-autoscaler-operator-7b667675db-tmlcb                1/2     CrashLoopBackOff   46 (2m53s ago)    3h45m   10.128.28.146   kuryr-5sxhw-master-2   <none>           <none>
openshift-machine-api                              machine-api-controllers-fdb99649c-ldb7t                     3/7     CrashLoopBackOff   184 (2m55s ago)   3h40m   10.128.29.90    kuryr-5sxhw-master-0   <none>           <none>
openshift-route-controller-manager                 route-controller-manager-d8f458684-7dgjm                    0/1     CrashLoopBackOff   43 (100s ago)     3h36m   10.128.55.11    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca-operator                      service-ca-operator-654f68c77f-g4w55                        0/1     CrashLoopBackOff   42 (2m2s ago)     3h45m   10.128.22.30    kuryr-5sxhw-master-2   <none>           <none>
openshift-service-ca                               service-ca-5f584b7d75-mxllm                                 0/1     CrashLoopBackOff   42 (45s ago)      3h42m   10.128.49.250   kuryr-5sxhw-master-0   <none>           <none>

$ oc get svc -A |grep  172.30.0.1 
default                                            kubernetes                                       ClusterIP   172.30.0.1       <none>        443/TCP                           3h50m

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1995

Bug OCPBUGS-20794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kube-state-metrics/pull/101

Bug OCPBUGS-3064: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/330

Bug OCPBUGS-21898: Ingress stuck in progressing when maxConnections increased to 2000000

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21803~~. The following is the description of the original issue:
—
Description of problem:

The test case https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926 was created for NE-577 epic. When we increase the 'spec.tuningOptions.maxConnections' to 200000, the default ingress controller stuck in progressing.

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-50926

Steps to Reproduce:

1.Edit the defualt controller with max value 2000000oc -n openshift-ingress-operator edit ingresscontroller defaulttuningOptions:
    maxConnections: 2000000
2.melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml | grep  -A1 tuningOptions
  tuningOptions:
    maxConnections: 2000000
3. melvinjoseph@mjoseph-mac openshift-tests-private % oc get co/ingress 
NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
ingress   4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h42m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......

Actual results:

The default ingress controller stuck in progressing

Expected results:

The ingress controller should work as normal

Additional info:

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS        RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running       0          38s
router-default-7cf67f448-qmvks    0/1     Running       0          38s
router-default-7dcd556587-kvk8d   0/1     Terminating   0          3h53m
router-default-7dcd556587-vppk4   1/1     Running       0          3h53m
melvinjoseph@mjoseph-mac openshift-tests-private % 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress get po
NAME                              READY   STATUS    RESTARTS   AGE
router-default-7cf67f448-gb7mr    0/1     Running   0          111s
router-default-7cf67f448-qmvks    0/1     Running   0          111s
router-default-7dcd556587-vppk4   1/1     Running   0          3h55m

melvinjoseph@mjoseph-mac openshift-tests-private % oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h28m   
baremetal                                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
cloud-controller-manager                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h58m   
cloud-credential                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h59m   
cluster-autoscaler                         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
config-operator                            4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
console                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h34m   
control-plane-machine-set                  4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
csi-snapshot-controller                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
dns                                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
etcd                                       4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h47m   
image-registry                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      176m    
ingress                                    4.15.0-0.nightly-2023-10-16-231617   True        True          False      3h39m   ingresscontroller "default" is progressing: IngressControllerProgressing: One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination......
insights                                   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h49m   
kube-apiserver                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
kube-controller-manager                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-scheduler                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h46m   
kube-storage-version-migrator              4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
machine-api                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h45m   
machine-approver                           4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
machine-config                             4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h53m   
marketplace                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h55m   
monitoring                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h35m   
network                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h57m   
node-tuning                                4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-apiserver                        4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
openshift-controller-manager               4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
openshift-samples                          4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h39m   
operator-lifecycle-manager                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-catalog         4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h54m   
operator-lifecycle-manager-packageserver   4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h43m   
service-ca                                 4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h56m   
storage                                    4.15.0-0.nightly-2023-10-16-231617   True        False         False      3h36m   
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get po
NAME                               READY   STATUS    RESTARTS        AGE
ingress-operator-c6fd989fd-jsrzv   2/2     Running   4 (3h45m ago)   3h58m
melvinjoseph@mjoseph-mac openshift-tests-private % 


melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator logs ingress-operator-c6fd989fd-jsrzv -c ingress-operator --tail=20
2023-10-17T11:34:54.327Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.348Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:34:54.394Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.394Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.397Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.429Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.446Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.ingressclass_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.route_metrics_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.553Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.557Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.9999758s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.558Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.583Z    INFO    operator.status_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:34:54.657Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "59m59.345629987s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
2023-10-17T11:34:54.794Z    INFO    operator.certificate_controller    controller/controller.go:118    Reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    handler/enqueue_mapped.go:81    queueing ingress    {"name": "default", "related": ""}
2023-10-17T11:36:11.151Z    INFO    operator.ingress_controller    controller/controller.go:118    reconciling    {"request": {"name":"default","namespace":"openshift-ingress-operator"}}
2023-10-17T11:36:11.248Z    ERROR    operator.ingress_controller    controller/controller.go:118    got retryable error; requeueing    {"after": "58m42.755479533s", "error": "IngressController may become degraded soon: DeploymentReplicasAllAvailable=False"}
melvinjoseph@mjoseph-mac openshift-tests-private % 

 
melvinjoseph@mjoseph-mac openshift-tests-private % oc get po -n openshift-ingress
NAME                              READY   STATUS    RESTARTS      AGE
router-default-7cf67f448-gb7mr    0/1     Running   1 (71s ago)   3m57s
router-default-7cf67f448-qmvks    0/1     Running   1 (70s ago)   3m57s
router-default-7dcd556587-vppk4   1/1     Running   0             3h57m

melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-gb7mr --tail=20 
I1017 11:39:22.623928       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:23.623924       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:24.623373       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:25.627359       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:26.623337       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:27.623603       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:28.623866       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:29.623183       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:30.623475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:31.623949       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private % 
melvinjoseph@mjoseph-mac openshift-tests-private %   oc -n openshift-ingress logs router-default-7cf67f448-qmvks --tail=20
I1017 11:39:34.553475       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:35.551412       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:36.551421       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
E1017 11:39:37.052068       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1017 11:39:37.551648       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:38.551632       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:39.551410       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:40.552620       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:41.552050       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.551076       1 healthz.go:261] backend-http check failed: healthz
[-]backend-http failed: backend reported failure
I1017 11:39:42.564293       1 template.go:828] router "msg"="Shutdown requested, waiting 45s for new connections to cease" 

melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller 
NAME      AGE
default   3h59m
melvinjoseph@mjoseph-mac openshift-tests-private % oc -n openshift-ingress-operator get ingresscontroller default -o yaml
apiVersion: operator.openshift.io/v1
<-----snip---->
status:
  availableReplicas: 1
  conditions:
  - lastTransitionTime: "2023-10-17T07:41:42Z"
    reason: Valid
    status: "True"
    type: Admitted
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: The deployment has Available status condition set to True
    reason: DeploymentAvailable
    status: "True"
    type: DeploymentAvailable
  - lastTransitionTime: "2023-10-17T07:57:01Z"
    message: Minimum replicas requirement is met
    reason: DeploymentMinimumReplicasMet
    status: "True"
    type: DeploymentReplicasMinAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: 1/2 of replicas are available
    reason: DeploymentReplicasNotAvailable
    status: "False"
    type: DeploymentReplicasAllAvailable
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |
      Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
    reason: DeploymentRollingOut
    status: "True"
    type: DeploymentRollingOut
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: The endpoint publishing strategy supports a managed load balancer
    reason: WantedByEndpointPublishingStrategy
    status: "True"
    type: LoadBalancerManaged
  - lastTransitionTime: "2023-10-17T07:57:24Z"
    message: The LoadBalancer service is provisioned
    reason: LoadBalancerProvisioned
    status: "True"
    type: LoadBalancerReady
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: LoadBalancer is not progressing
    reason: LoadBalancerNotProgressing
    status: "False"
    type: LoadBalancerProgressing
  - lastTransitionTime: "2023-10-17T07:41:43Z"
    message: DNS management is supported and zones are specified in the cluster DNS
      config.
    reason: Normal
    status: "True"
    type: DNSManaged
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    message: The record is provisioned in all reported zones.
    reason: NoFailedZones
    status: "True"
    type: DNSReady
  - lastTransitionTime: "2023-10-17T07:57:26Z"
    status: "True"
    type: Available
  - lastTransitionTime: "2023-10-17T11:34:54Z"
    message: |-
      One or more status conditions indicate progressing: DeploymentRollingOut=True (DeploymentRollingOut: Waiting for router deployment rollout to finish: 1 old replica(s) are pending termination...
      )
    reason: IngressControllerProgressing
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-10-17T07:57:28Z"
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-10-17T07:41:43Z"
<-----snip---->

Bug MGMT-14114: [Staging] - Nutanix - uninitialized set on nodes

View the Description View the linked PRs

Description of the problem:

Staging, BE v2.17.3 - Trying to install OCP 4.13 Nutanix cluster and getting no ingress for host error. Igal saw the error is

Warning  FailedScheduling  98m                 default-scheduler  0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 3 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/5 nodes are available: 2 node(s) didn't match pod anti-affinity rules, 3 Preemption is not helpful for scheduling..

Which comes from

 removeUninitializedTaint := false
if cluster.Platform != nil && *cluster.Platform.Type == models.PlatformTypeVsphere {
   removeUninitializedTaint = true
}

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-installer/pull/653

Bug OCPBUGS-11371: oc-mirror fails to complete with heads only complaining about devworkspace-operator

View the Description View the linked PRs

Description of problem:

oc-mirror fails to complete with heads only complaining about devworkspace-operator

Version-Release number of selected component (if applicable):

# oc-mirror version
Client Version: version.Info{Major:"", Minor:"", GitVersion:"4.12.0-202302280915.p0.g3d51740.assembly.stream-3d51740", GitCommit:"3d517407dcbc46ededd7323c7e8f6d6a45efc649", GitTreeState:"clean", BuildDate:"2023-03-01T00:20:53Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

Attempt a headsonly mirroring for registry.redhat.io/redhat/redhat-operator-index:v4.10

Steps to Reproduce:

1. Imageset currently:

kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  registry:
    imageURL: myregistry.mydomain:5000/redhat-operators
    skipTLS: false
mirror:
  operators:
  - catalog: registry.redhat.io/redhat/redhat-operator-index:v4.10

2.$ oc mirror --config=./imageset-config.yml docker://otherregistry.mydomain:5000/redhat-operators

Checking push permissions for otherregistry.mydomain:5000
Found: oc-mirror-workspace/src/publish
Found: oc-mirror-workspace/src/v2
Found: oc-mirror-workspace/src/charts
Found: oc-mirror-workspace/src/release-signatures
WARN[0026] DEPRECATION NOTICE:
Sqlite-based catalogs and their related subcommands are deprecated. Support for
them will be removed in a future release. Please migrate your catalog workflows
to the new file-based catalog format. 

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.  

error: error generating diff: channel fast: head "devworkspace-operator.v0.19.1-0.1679521112.p" not reachable from bundle "devworkspace-operator.v0.19.1"

Actual results:

error: error generating diff: channel fast: head "devworkspace-operator.v0.19.1-0.1679521112.p" not reachable from bundle "devworkspace-operator.v0.19.1"

Expected results:

For the catalog to be mirrored.

https://github.com/openshift/oc-mirror/pull/608

Bug OCPBUGS-14633: CPO crash if HO is not upgraded

View the Description View the linked PRs

https://github.com/openshift/hypershift/pull/2437 created a binding between HO and CPO as a CPO that contains this PR crashes when deployed by an HO that does not.

The reason appears to be related to the absence of the OPENSHIFT_IMG_OVERRIDES envvar on the CPO deployment.

{"level":"info","ts":"2023-06-06T16:36:21Z","logger":"setup","msg":"Using CPO image","image":"registry.ci.openshift.org/ocp/4.14-2023-06-06-102645@sha256:2d81c28856f5c0a73e55e7cb6fbc208c738fb3ca7c200cc7eb46efb40c8e10d2"}
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/openshift/hypershift/support/util.ConvertImageRegistryOverrideStringToMap({0x0, 0x0})
        /hypershift/support/util/util.go:237 +0x454
main.NewStartCommand.func1(0xc000d80000, {0xc000a71180, 0x0, 0x8})
        /hypershift/control-plane-operator/main.go:345 +0x2225

      containers:
      - args:
        - run
        - --namespace
        - $(MY_NAMESPACE)
        - --deployment-name
        - control-plane-operator
        - --metrics-addr
        - 0.0.0.0:8080
        - --enable-ci-debug-output=false
        - --registry-overrides==
        command:
        - /usr/bin/control-plane-operator

https://github.com/openshift/hypershift/pull/2660

Bug OCPBUGS-1684: collect-profiles pods causing regular CPU bursts

View the Description View the linked PRs

Description of problem:

After an upgrade from 4.9 to 4.10 collect+ process causing  CPU bursts of 5-6 seconds every 15 minutes regularly. During each burst collect+ consume 100% CPU.

Top Command Dump Sample:
top - 07:00:04 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.3 us,  4.5 sy,  0.0 ni, 80.8 id,  7.4 wa,  0.8 hi,  0.3 si,  0.0 st
MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2009 root      20   0 3741252 172136  71396 S  12.9   0.5  36:42.79 kubelet
   1954 root      20   0 2663680 130928  46156 S   7.9   0.4   6:50.44 crio
   9440 root      20   0 1633728 546036  60836 S   7.9   1.7  21:06.80 fluentd
      1 root      20   0  238416  15412   8968 S   5.9   0.0   1:56.73 systemd
   1353 800       10 -10  796808 165380  40916 S   5.0   0.5   2:32.11 ovs-vsw+
   5454 root      20   0 1729112  73680  37404 S   2.0   0.2   3:52.21 coredns
1061248 1000360+  20   0 1113524  24304  17776 S   2.0   0.1   0:00.03 collect+
    306 root       0 -20       0      0      0 I   1.0   0.0   0:00.37 kworker+
    957 root      20   0  264076 126280 119596 S   1.0   0.4   0:06.80 systemd+
   1114 dbus      20   0   83188   6224   5140 S   1.0   0.0   0:04.30 dbus-da+
   5710 root      20   0  406004  31384  15068 S   1.0   0.1   0:04.11 tuned
   6198 nobody    20   0 1632272  46588  20516 S   1.0   0.1   0:17.60 network+
1061291 1000650+  20   0   11896   2748   2496 S   1.0   0.0   0:00.01 bash
1061355 1000650+  20   0   11896   2868   2616 S   1.0   0.0   0:00.01 bashtop - 07:00:05 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
%Cpu(s): 11.4 us,  2.0 sy,  0.0 ni, 81.5 id,  4.2 wa,  0.6 hi,  0.2 si,  0.0 st
MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  36464  21300 S  74.3   0.1   0:00.78 collect+
   9440 root      20   0 1633728 545412  60900 S  11.9   1.7  21:06.92 fluentd
   2009 root      20   0 3741252 172396  71396 S   4.0   0.5  36:42.83 kubelet
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.74 systemd
    300 root       0 -20       0      0      0 I   1.0   0.0   0:00.46 kworker+
   1427 root      20   0   19656   2204   2064 S   1.0   0.0   0:01.55 agetty
   2419 root      20   0 1714748  38812  22884 S   1.0   0.1   0:24.42 coredns+
   2528 root      20   0 1634680  36464  20628 S   1.0   0.1   0:22.01 dynkeep+
1009372 root      20   0       0      0      0 I   1.0   0.0   0:00.42 kworker+
1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.56 toptop - 07:00:06 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s): 15.3 us,  1.5 sy,  0.0 ni, 82.7 id,  0.1 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35740  21428 S  99.0   0.1   0:01.78 collect+
   2009 root      20   0 3741252 172396  71396 S   3.0   0.5  36:42.86 kubelet
   9440 root      20   0 1633728 545076  60900 S   2.0   1.7  21:06.94 fluentd
   1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.12 ovs-vsw+
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.45 crio top - 07:00:07 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.7 us,  1.1 sy,  0.0 ni, 83.6 id,  0.1 wa,  0.4 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35236  21492 S 102.0   0.1   0:02.80 collect+
   2009 root      20   0 3741252 172660  71396 S   7.0   0.5  36:42.93 kubelet
   3288 nobody    20   0  718964  30648  11680 S   3.0   0.1   3:36.84 node_ex+
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.75 systemd
   1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.13 ovs-vsw+
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.46 crio
   5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.22 coredns
   9440 root      20   0 1633728 545080  60900 S   1.0   1.7  21:06.95 fluentd
1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.57 toptop - 07:00:08 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
Tasks: 247 total,   2 running, 245 sleeping,   0 stopped,   0 zombie
%Cpu(s): 14.2 us,  0.9 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1061248 1000360+  20   0 1484936  35164  21492 S 100.0   0.1   0:03.81 collect+
   2009 root      20   0 3741252 172660  71396 S   3.0   0.5  36:42.96 kubelet
1061543 1000650+  20   0   34564   9804   5772 R   3.0   0.0   0:00.03 python
   9440 root      20   0 1633728 543952  60900 S   2.0   1.7  21:06.97 fluentd
1053353 root      20   0   50200   4012   3292 R   2.0   0.0   0:01.59 top
   2330 root      20   0 1654612  61260  34720 S   1.0   0.2   0:55.81 coredns
   8023 root      20   0   12056   3044   2580 S   1.0   0.0   0:24.59 install+top - 07:00:09 up 10:10,  0 users,  load average: 0.34, 0.27, 0.28
Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.9 us,  3.2 sy,  0.0 ni, 85.6 id,  1.5 wa,  0.5 hi,  0.2 si,  0.0 st
MiB Mem :  32151.9 total,  22621.0 free,   2160.5 used,   7370.4 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29441.9 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
   2009 root      20   0 3741252 172660  71396 S   5.0   0.5  36:43.01 kubelet
   9440 root      20   0 1633728 542684  60900 S   4.0   1.6  21:07.01 fluentd
   1353 800       10 -10  796808 165380  40916 S   2.0   0.5   2:32.15 ovs-vsw+
      1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.76 systemd
   1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.47 crio
   5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.23 coredns
   6198 nobody    20   0 1632272  45936  20516 S   1.0   0.1   0:17.61 network+
   7016 root      20   0   12052   3204   2736 S   1.0   0.0   0:24.19 install+

Version-Release number of selected component (if applicable):

How reproducible:

Lab environment does not present same behavior.

Steps to Reproduce:

1.
2.
3.

Actual results:

Regular high CPU spikes

Expected results:

No CPU spikes

Additional info:

Provided logs:
1-) top command dump uploaded to SF case 03317387
2-) must-gather uploaded to SF case 03317387

https://github.com/openshift/operator-framework-olm/pull/486

Bug OCPBUGS-19336: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13159

Task OKD-174: Enable extension container (MCO)

View the Description View the linked PRs

Essentially unmerge Christian's previous merge in the MCO that disabled the extension container.

https://github.com/openshift/machine-config-operator/pull/3741

Bug OCPBUGS-17919: Azure MAO CredentialRequests Missing Compute Permissions

View the Description View the linked PRs

Description of problem:

CredentialsRequest for Azure AD Workload Identity missing disk encryption set read permissions.

- Microsoft.Compute/diskEncryptionSets/read

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Every time when creating a machine with a disk encryption set

Steps to Reproduce:

1. Create workload identity cluster
2. Create keyvault and secret within keyvault
3. Create disk encryption set and point it to keyvault; can use system-assigned identity 
4. Create or modify existing machineset to include a disk encryption set.  
            managedDisk:
              diskEncryptionSet:
                id: /subscriptions/<subscription_id>/resourceGroups/<resource_id>/providers/Microsoft.Compute/diskEncryptionSets/<disk_encryption_set_name>
5. Scale machineset

Actual results:

'failed to create vm <vm_name>:
        failure sending request for machine steven-wi-cluster-pzqvm-worker-eastus3-mfk5z:
        cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending
        request: StatusCode=403 -- Original Error: Code="LinkedAuthorizationFailed"
        Message="The client ''55c10ba9-f891-4f42-a697-0ab283b86c63'' with object id
        ''55c10ba9-f891-4f42-a697-0ab283b86c63'' has permission to perform action
        ''Microsoft.Compute/virtualMachines/write'' on scope ''/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/virtualMachines/steven-wi-cluster-pzqvm-worker-eastus3-mfk5z'';
        however, it does not have permission to perform action ''read'' on the linked
        scope(s) ''/subscriptions/<subscription_id>/resourceGroups/<resource_group>/providers/Microsoft.Compute/diskEncryptionSets/test-disk-encryption-set''
        or the linked scope(s) are invalid."'

Expected results:

The machine is able to create and join the cluster successfully.

Additional info:

Docs about preparing disk encryption sets on Azure: https://docs.openshift.com/container-platform/4.12/installing/installing_azure/enabling-user-managed-encryption-azure.html

https://github.com/openshift/machine-api-operator/pull/1162

Bug OCPBUGS-19353: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1551

Bug OCPBUGS-6882: Machine should create failed when availabilityZone and subnet id is mismatch (AWS)

View the Description View the linked PRs

Description of problem:

Machine should create failed when availabilityZone and subnet id is mismatch, 
currently the machine create successfully when availabilityZone and subnet id is mismatch, and the cpms cannot be recreated after deleting.
Another, for the subnet is filter, if availabilityZone and filter is mismatch, the machine will create failed.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-31-072358

How reproducible:

always

Steps to Reproduce:

1.Create a machineset whose availabilityZone and subnet id is mismatch, for example, availabilityZone is us-east-2a, but the subnet id is for us-east-2b

          placement:
            availabilityZone: us-east-2a
            region: us-east-2
          securityGroups:
          - filters:
            - name: tag:Name
              values:
              - huliu-aws1w-nk5xd-worker-sg
          subnet:
            id: subnet-0107b4d7cfa35eb9b 

2.Machine created successfully in us-east-2b zone
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                                PHASE     TYPE         REGION      ZONE         AGE
huliu-aws1w-nk5xd-master-0                          Running   m6i.xlarge   us-east-2   us-east-2a   62m
huliu-aws1w-nk5xd-master-1                          Running   m6i.xlarge   us-east-2   us-east-2b   62m
huliu-aws1w-nk5xd-master-2                          Running   m6i.xlarge   us-east-2   us-east-2a   62m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-689vq   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-nf9dl   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-worker-us-east-2a-8kpht           Running   m6i.xlarge   us-east-2   us-east-2a   59m
huliu-aws1w-nk5xd-worker-us-east-2a-dmtlc           Running   m6i.xlarge   us-east-2   us-east-2a   59m
huliu-aws1w-nk5xd-worker-us-east-2b-kdn75           Running   m6i.xlarge   us-east-2   us-east-2b   59m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine -o yaml |grep "id: subnet"
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0107b4d7cfa35eb9b
          id: subnet-0fef0e9e255742f3a
          id: subnet-0fef0e9e255742f3a
          id: subnet-0107b4d7cfa35eb9b

Actual results:

Machine created successfully in the zone which the subnet id stands for, for the case it created in us-east-2b

huliu-aws1w-nk5xd-windows-worker-us-east-2a-689vq   Running   m5a.large    us-east-2   us-east-2b   37m
huliu-aws1w-nk5xd-windows-worker-us-east-2a-nf9dl   Running   m5a.large    us-east-2   us-east-2b   37m

Expected results:

Machine should create failed as availabilityZone and subnet id is mismatch

Additional info:

1. For the subnet is filter, if availabilityZone and filter is mismatch, the machine will create failed.

huliu-aws1w2-x2tnx-worker-2-m4r8m            Failed                                          4s 
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws1w2-x2tnx-worker-2-m4r8m  -o yaml
…
      placement:
        availabilityZone: us-east-2a
        region: us-east-2
      securityGroups:
      - filters:
        - name: tag:Name
          values:
          - huliu-aws1w2-x2tnx-worker-sg
      spotMarketOptions: {}
      subnet:
        filters:
        - name: tag:Name
          values:
          - huliu-aws1w2-x2tnx-private-us-east-2c
      tags:
      - name: kubernetes.io/cluster/huliu-aws1w2-x2tnx
        value: owned
      userDataSecret:
        name: worker-user-data
status:
  conditions:
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2023-02-01T02:45:52Z"
    status: "True"
    type: Terminable
  errorMessage: 'error getting subnet IDs: no subnet IDs were found'
  errorReason: InvalidConfiguration
  lastUpdated: "2023-02-01T02:45:53Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-02-01T02:45:53Z"
      message: 'error getting subnet IDs: no subnet IDs were found'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreation

2.For this case, machine create successfully when availabilityZone and subnet id is mismatch, the cpms cannot be recreated after deleting.

liuhuali@Lius-MacBook-Pro huali-test % oc delete controlplanemachineset cluster 
controlplanemachineset.machine.openshift.io "cluster" deleted
liuhuali@Lius-MacBook-Pro huali-test % oc get controlplanemachineset                                
No resources found in openshift-machine-api namespace.

I0201 02:11:07.850022       1 http.go:143] controller-runtime/webhook/webhooks "msg"="wrote response" "UID"="12f118c4-fafe-45f9-bd24-876abdb8ba83" "allowed"=false "code"=403 "reason"="spec.template.machines_v1beta1_machine_openshift_io.failureDomains: Forbidden: no control plane machine is using specified failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0107b4d7cfa35eb9b}}], failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0fef0e9e255742f3a}}] are duplicated within the control plane machines, please correct failure domains to match control plane machines" "webhook"="/validate-machine-openshift-io-v1-controlplanemachineset"
I0201 02:11:07.850787       1 controller.go:144]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="767c4631-ed83-47da-b316-29a21cdba245"
E0201 02:11:07.850828       1 controller.go:326]  "msg"="Reconciler error" "error"="error reconciling control plane machine set: unable to create control plane machine set: unable to create control plane machine set: admission webhook \"controlplanemachineset.machine.openshift.io\" denied the request: spec.template.machines_v1beta1_machine_openshift_io.failureDomains: Forbidden: no control plane machine is using specified failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0107b4d7cfa35eb9b}}], failure domain(s) [AWSFailureDomain{AvailabilityZone:us-east-2a, Subnet:{Type:ID, Value:subnet-0fef0e9e255742f3a}}] are duplicated within the control plane machines, please correct failure domains to match control plane machines" "controller"="controlplanemachinesetgenerator" "reconcileID"="767c4631-ed83-47da-b316-29a21cdba245"

Bug OCPBUGS-14338: "shouldn't exceed the 650 series limit of total series sent via telemetry from each cluster" failing on techpreview jobs

View the Description View the linked PRs

Description of problem:


This test is permafailing on techpreview since https://github.com/openshift/origin/pull/27915 landed

[sig-instrumentation][Late] Alerts shouldn't exceed the 650 series limit of total series sent via telemetry from each cluster [Suite:openshift/conformance/parallel]

            s: "promQL query returned unexpected results:\navg_over_time(cluster:telemetry_selected_series:count[49m15s]) >= 650\n[\n  {\n    \"metric\": {\n      \"prometheus\": \"openshift-monitoring/k8s\"\n    },\n    \"value\": [\n      1685504058.881,\n      \"700.3636363636364\"\n    ]\n  }\n]",

Version-Release number of selected component (if applicable):


4.14

How reproducible:


Always

Steps to Reproduce:

1. Run conformance tests on a techpreview cluster

Actual results:

Test fails

Expected results:

Test succeeds

Additional info:


Example job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-techpreview/1663723476923453440

https://github.com/openshift/origin/pull/27959

Bug OCPBUGS-16717: Unable to set BMH credentials, Apply button greyed out.

View the Description View the linked PRs

Description of problem:

A cluster installed via ACM and nodes are showing as Unmanaged. When trying to set the BMH credential via console, the Apply button is not clickable(greyed out).

Version-Release number of selected component (if applicable): 4.11

How reproducible: Always

Steps to Reproduce:
1. Install a cluster via ACM
2. Setting a BMH credential on console
3.

Actual results:

The Apply button on the console screen is greyed out, unclickable.

Expected results:

Should be able to configure BHM credential

Additional info:{code:none}

https://github.com/openshift/console/pull/13075

Bug OCPBUGS-17252: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/695

Bug OCPBUGS-17690: prometheus-adapter removed --logtostderr

View the Description View the linked PRs

The cli option --logtostderr was removed in prometheus-adapter v0.11. CMO uses this argument and this currently blocks the update to v0.11: https://github.com/openshift/k8s-prometheus-adapter/pull/72

Iiuc we can simply drop this argument.

https://github.com/openshift/cluster-monitoring-operator/pull/2075

Bug OCPBUGS-5548: Pipeline is not removed when Deployment/DC/Knative Service or Application is deleted

View the Description View the linked PRs

Description of problem:
This is a follow-up on https://bugzilla.redhat.com/show_bug.cgi?id=2083087 and https://github.com/openshift/console/pull/12390

When creating a Deployment, DeploymentConfig, or Knative Service with enabled Pipeline, and then deleting it again with the enabled option "Delete other resources created by console" (only available on 4.13+ with the PR above) the automatically created Pipeline is not deleted.

When the user tries to create the same resource with a Pipeline again this fails with an error:

An error occurred
secrets "nodeinfo-generic-webhook-secret" already exists

Version-Release number of selected component (if applicable):
4.13

(we might want to backport this together with https://github.com/openshift/console/pull/12390 and ~~OCPBUGS-5547~~)

How reproducible:
Always

Steps to Reproduce:

Install OpenShift Pipelines operator (tested with 1.8.2)
Create a new project
Navigate to Add > Import from git and create an application
Case 1: In the topology select the new resource and delete it
Case 2: In the topology select the application group and delete the complete app

Actual results:
Case 1: Delete resources:

Deployment (tries it twice!) $name
Service $name
Route $name
ImageStream $name

Case 2: Delete application:

Deployment (just once) $name
Service $name
Route $name
ImageStream $name

Expected results:
Case 1: Delete resource:

Delete Deployment $name should be called just once
(Keep this deletion) Service $name
(Keep this deletion) Route $name
(Keep this deletion) ImageStream $name
Missing deletion of the Tekton Pipeline $name
Missing deletion of the Tekton TriggerTemplate with generated name trigger-template-$name-$random
Missing deletion of the Secret $name-generic-webhook-secret
Missing deletion of the Secret $name-github-webhook-secret

Case 2: Delete application:

(Keep this deletion) Deployment $name
(Keep this deletion) Service $name
(Keep this deletion) Route $name
(Keep this deletion) ImageStream $name
Missing deletion of the Tekton Pipeline $name
Missing deletion of the Tekton TriggerTemplate with generated name trigger-template-$name-$random
Missing deletion of the Secret $name-generic-webhook-secret
Missing deletion of the Secret $name-github-webhook-secret

Additional info:

https://github.com/openshift/console/pull/12587

Bug OCPBUGS-8093: OKD 4.13 fails on block volmod tests

View the Description View the linked PRs

Starting with https://amd64.origin.releases.ci.openshift.org/releasestream/4.13.0-0.okd/release/4.13.0-0.okd-2023-02-28-170012 multiple storage tests are failing:

  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with pvc data  source [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: local][LocalVolumeType: block]  [Testpattern: Pre-provisioned PV (block volmode)] volumes should store  data [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and write  from pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic  PV (block volmode)] provisioning should provision storage with snapshot  data source [Feature:VolumeSnapshotDataSource]  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume at the same time should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] One pod  requesting one prebound PVC should be able to mount volume and read from  pod1 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more | :  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] PersistentVolumes-local  [Volume type: block] Two pods  mounting a local volume one after the other should be able to write from  pod1 and read from pod2 [Skipped:NoOptionalCapabilities]  [Suite:openshift/conformance/parallel] [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern: Dynamic PV  (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more
:  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern:  Pre-provisioned PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more | :  [sig-storage] In-tree Volumes [Driver: aws] [Testpattern:  Pre-provisioned PV (block volmode)] volumes should store data  [Skipped:NoOptionalCapabilities] [Suite:openshift/conformance/parallel]  [Suite:k8s] expand_more

cc Hemant Kumar

https://github.com/openshift/cluster-authentication-operator/pull/628

Bug OCPBUGS-13107: cluster-destroy: Too many workers for a standalone OSP

View the Description View the linked PRs

Description of problem:

Running `openshift-install cluster destroy` defeats an OpenStack cloud with many Swift objects, if said cloud is low on resources.

In particular, testing the teardown of an OCP cluster with 500.000 objects in the image registry caused RabbitMQ to crash on a standalone (single-host) OpenStack deployment backed with NVMe storage.

Version-Release number of selected component (if applicable):

How reproducible:

on a constrained (single-host) OpenStack cloud, with the default limit of 10000 to the bulk-deletion of Swift objects.

Steps to Reproduce:

1. install OpenShift
2. upload 500000 arbitrary objects in the image-registry container
3. launch cluster teardown
4. enjoy Swift responding 504 errors, and the rest of the cluster to become unstable

https://github.com/openshift/installer/pull/7165

Bug OCPBUGS-14585: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ironic-agent-image/pull/87

Bug OCPBUGS-14816: typo in 4.14 CHANGELOG.md and CONTRIBUTING.md

View the Description View the linked PRs

Description of problem:

see issue from: https://github.com/openshift/cluster-monitoring-operator/issues/1992

https://github.com/openshift/cluster-monitoring-operator/pull/1994

Bug OCPBUGS-19463: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/3029

Bug OCPBUGS-24307: Sync openshift-apiserver's shutdown-delay-duration with core offering

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23397~~. The following is the description of the original issue:
—
Description of problem:

The shutdown-delay-duration argument for the openshift-apiserver is set to 3s in hypershift, but set to 15s in core openshift. Hypershift should update the value to match.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Diff the openshift-apiserver configs

Actual results:

https://github.com/openshift/hypershift/blob/3a42e77041535c8ac8012856d279bc782efcaf3c/control-plane-operator/controllers/hostedcontrolplane/oapi/config.go#L59C1-L60C1

Expected results:

https://github.com/openshift/cluster-openshift-apiserver-operator/commit/cad9746b62abf3b3230592d45f7f60bcecc96dac

Additional info:

https://github.com/openshift/hypershift/pull/3264

Bug OCPBUGS-7395: Users don't know what type of resource is being created by Import from Git or Deploy Image flows

View the Description View the linked PRs

Description of problem

Since resource type option has been moved to an advanced option in both the Deploy Image and Import from Git flows, there is confusion for some existing customers who are using the feature.

The UI no longer provides transparency of the type of resource which is being created.

Version-Release number of selected component (if applicable)

How reproducible

Steps to Reproduce

1.
2.
3.

Actual results

Expected results

Remove Resource type from Adv Options, and place it back where it was previously. Resource type selection is now a dropdown so that we will put it in its previous spot, but it will use a different component from 4.11.

https://github.com/openshift/console/pull/12615

Story MGMT-14769: Enable upgrade agent by default for ACM

View the Description View the linked PRs

Currently the upgrade feature agent is disabled by default and enabled explicitly only for the SaaS environment. This ticket is about enabling it by default also for ACM.

https://github.com/openshift/assisted-service/pull/5276

Bug OCPBUGS-24349: Pipeline Builder crashes after a Task was installed from ArtifactHub

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24001~~. The following is the description of the original issue:
—
After I installed a "Git" Task from ArtifactHub directly the Pipelines Builder and searched for a "git" Task again the Pipeline Builder crashes.

Steps to reproduce:

Install Pipelines operator
Navigate to Developer perspective > Pipelines
Press on Create to open the Pipeline Builder
Click on "Add task"
Search for "git"
Navigate down to an entry called "git" from the ArtifactHub and press enter to install it.
This automatically imports this Task below into the current project. You can also apply that yaml to reproduce this bug.
Click on "Add task" again
Search for "git"
Navigate down to the different git tasks.

Actual behaviour
Page crashes

Expected behaviour
Page should not crash

Additional information
Created/Imported Task:

apiVersion: tekton.dev/v1
kind: Task
metadata:
  annotations:
    openshift.io/installed-from: ArtifactHub
    tekton.dev/categories: Git
    tekton.dev/displayName: git
    tekton.dev/pipelines.minVersion: 0.38.0
    tekton.dev/platforms: 'linux/amd64,linux/s390x,linux/ppc64le,linux/arm64'
    tekton.dev/tags: git
  resourceVersion: '50218855'
  name: git
  uid: 1b88150a-f2c1-4030-9849-c7806c0745d8
  creationTimestamp: '2023-11-28T10:54:51Z'
  generation: 1
  labels:
    app.kubernetes.io/version: 0.1.0
spec:
  description: |
    This Task represents Git and is able to initialize and clone a remote repository on the informed Workspace. It's likely to become the first `step` on a Pipeline.
  params:
    - description: |
        Git repository URL.
      name: URL
      type: string
    - default: main
      description: |
        Revision to checkout, an branch, tag, sha, ref, etc...
      name: REVISION
      type: string
    - default: ''
      description: |
        Repository `refspec` to fetch before checking out the revision.
      name: REFSPEC
      type: string
    - default: 'true'
      description: |
        Initialize and fetch Git submodules.
      name: SUBMODULES
      type: string
    - default: '1'
      description: |
        Number of commits to fetch, a "shallow clone" is a single commit.
      name: DEPTH
      type: string
    - default: 'true'
      description: |
        Sets the global `http.sslVerify` value, `false` is not advised unless
        you trust the remote repository.
      name: SSL_VERIFY
      type: string
    - default: ca-bundle.crt
      description: |
        Certificate Authority (CA) bundle filename on the `ssl-ca-directory`
        Workspace.
      name: CRT_FILENAME
      type: string
    - default: ''
      description: |
        Relative path to the `output` Workspace where the repository will be
        cloned.
      name: SUBDIRECTORY
      type: string
    - default: ''
      description: |
        List of directory patterns split by comma to perform "sparse checkout".
      name: SPARSE_CHECKOUT_DIRECTORIES
      type: string
    - default: 'true'
      description: |
        Clean out the contents of the `output` Workspace before cloning the
        repository, if data exists.
      name: DELETE_EXISTING
      type: string
    - default: ''
      description: |
        HTTP proxy server (non-TLS requests).
      name: HTTP_PROXY
      type: string
    - default: ''
      description: |
        HTTPS proxy server (TLS requests).
      name: HTTPS_PROXY
      type: string
    - default: ''
      description: |
        Opt out of proxying HTTP/HTTPS requests.
      name: NO_PROXY
      type: string
    - default: 'false'
      description: |
        Log the commands executed.
      name: VERBOSE
      type: string
    - default: /home/git
      description: |
        Absolute path to the Git user home directory.
      name: USER_HOME
      type: string
  results:
    - description: |
        The precise commit SHA digest cloned.
      name: COMMIT
      type: string
    - description: |
        The precise repository URL.
      name: URL
      type: string
    - description: |
        The epoch timestamp of the commit cloned.
      name: COMMITTER_DATE
      type: string
  stepTemplate:
    computeResources:
      limits:
        cpu: 100m
        memory: 256Mi
      requests:
        cpu: 100m
        memory: 256Mi
    env:
      - name: PARAMS_URL
        value: $(params.URL)
      - name: PARAMS_REVISION
        value: $(params.REVISION)
      - name: PARAMS_REFSPEC
        value: $(params.REFSPEC)
      - name: PARAMS_SUBMODULES
        value: $(params.SUBMODULES)
      - name: PARAMS_DEPTH
        value: $(params.DEPTH)
      - name: PARAMS_SSL_VERIFY
        value: $(params.SSL_VERIFY)
      - name: PARAMS_CRT_FILENAME
        value: $(params.CRT_FILENAME)
      - name: PARAMS_SUBDIRECTORY
        value: $(params.SUBDIRECTORY)
      - name: PARAMS_SPARSE_CHECKOUT_DIRECTORIES
        value: $(params.SPARSE_CHECKOUT_DIRECTORIES)
      - name: PARAMS_DELETE_EXISTING
        value: $(params.DELETE_EXISTING)
      - name: PARAMS_HTTP_PROXY
        value: $(params.HTTP_PROXY)
      - name: PARAMS_HTTPS_PROXY
        value: $(params.HTTPS_PROXY)
      - name: PARAMS_NO_PROXY
        value: $(params.NO_PROXY)
      - name: PARAMS_VERBOSE
        value: $(params.VERBOSE)
      - name: PARAMS_USER_HOME
        value: $(params.USER_HOME)
      - name: WORKSPACES_OUTPUT_PATH
        value: $(workspaces.output.path)
      - name: WORKSPACES_SSH_DIRECTORY_BOUND
        value: $(workspaces.ssh-directory.bound)
      - name: WORKSPACES_SSH_DIRECTORY_PATH
        value: $(workspaces.ssh-directory.path)
      - name: WORKSPACES_BASIC_AUTH_BOUND
        value: $(workspaces.basic-auth.bound)
      - name: WORKSPACES_BASIC_AUTH_PATH
        value: $(workspaces.basic-auth.path)
      - name: WORKSPACES_SSL_CA_DIRECTORY_BOUND
        value: $(workspaces.ssl-ca-directory.bound)
      - name: WORKSPACES_SSL_CA_DIRECTORY_PATH
        value: $(workspaces.ssl-ca-directory.path)
      - name: RESULTS_COMMITTER_DATE_PATH
        value: $(results.COMMITTER_DATE.path)
      - name: RESULTS_COMMIT_PATH
        value: $(results.COMMIT.path)
      - name: RESULTS_URL_PATH
        value: $(results.URL.path)
    securityContext:
      runAsNonRoot: true
      runAsUser: 65532
  steps:
    - computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: load-scripts
      script: |
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKCmV4cG9ydCBQQVJBTVNfVVJMPSIke1BBUkFNU19VUkw6LX0iCmV4cG9ydCBQQVJBTVNfUkVWSVNJT049IiR7UEFSQU1TX1JFVklTSU9OOi19IgpleHBvcnQgUEFSQU1TX1JFRlNQRUM9IiR7UEFSQU1TX1JFRlNQRUM6LX0iCmV4cG9ydCBQQVJBTVNfU1VCTU9EVUxFUz0iJHtQQVJBTVNfU1VCTU9EVUxFUzotfSIKZXhwb3J0IFBBUkFNU19ERVBUSD0iJHtQQVJBTVNfREVQVEg6LX0iCmV4cG9ydCBQQVJBTVNfU1NMX1ZFUklGWT0iJHtQQVJBTVNfU1NMX1ZFUklGWTotfSIKZXhwb3J0IFBBUkFNU19DUlRfRklMRU5BTUU9IiR7UEFSQU1TX0NSVF9GSUxFTkFNRTotfSIKZXhwb3J0IFBBUkFNU19TVUJESVJFQ1RPUlk9IiR7UEFSQU1TX1NVQkRJUkVDVE9SWTotfSIKZXhwb3J0IFBBUkFNU19TUEFSU0VfQ0hFQ0tPVVRfRElSRUNUT1JJRVM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFUzotfSIKZXhwb3J0IFBBUkFNU19ERUxFVEVfRVhJU1RJTkc9IiR7UEFSQU1TX0RFTEVURV9FWElTVElORzotfSIKZXhwb3J0IFBBUkFNU19IVFRQX1BST1hZPSIke1BBUkFNU19IVFRQX1BST1hZOi19IgpleHBvcnQgUEFSQU1TX0hUVFBTX1BST1hZPSIke1BBUkFNU19IVFRQU19QUk9YWTotfSIKZXhwb3J0IFBBUkFNU19OT19QUk9YWT0iJHtQQVJBTVNfTk9fUFJPWFk6LX0iCmV4cG9ydCBQQVJBTVNfVkVSQk9TRT0iJHtQQVJBTVNfVkVSQk9TRTotfSIKZXhwb3J0IFBBUkFNU19VU0VSX0hPTUU9IiR7UEFSQU1TX1VTRVJfSE9NRTotfSIKCmV4cG9ydCBXT1JLU1BBQ0VTX09VVFBVVF9QQVRIPSIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTSF9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfQk9VTkQ9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg9IiR7V09SS1NQQUNFU19CQVNJQ19BVVRIX1BBVEg6LX0iCmV4cG9ydCBXT1JLU1BBQ0VTX1NTTF9DQV9ESVJFQ1RPUllfQk9VTkQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX0JPVU5EOi19IgpleHBvcnQgV09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEg6LX0iCgpleHBvcnQgUkVTVUxUU19DT01NSVRURVJfREFURV9QQVRIPSIke1JFU1VMVFNfQ09NTUlUVEVSX0RBVEVfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfQ09NTUlUX1BBVEg9IiR7UkVTVUxUU19DT01NSVRfUEFUSDotfSIKZXhwb3J0IFJFU1VMVFNfVVJMX1BBVEg9IiR7UkVTVUxUU19VUkxfUEFUSDotfSIKCiMgZnVsbCBwYXRoIHRvIHRoZSBjaGVja291dCBkaXJlY3RvcnksIHVzaW5nIHRoZSBvdXRwdXQgd29ya3NwYWNlIGFuZCBzdWJkaXJlY3RvciBwYXJhbWV0ZXIKZXhwb3J0IGNoZWNrb3V0X2Rpcj0iJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfS8ke1BBUkFNU19TVUJESVJFQ1RPUll9IgoKIwojIEZ1bmN0aW9ucwojCgpmYWlsKCkgewogICAgZWNobyAiRVJST1I6ICR7QH0iIDE+JjIKICAgIGV4aXQgMQp9CgpwaGFzZSgpIHsKICAgIGVjaG8gIi0tLT4gUGhhc2U6ICR7QH0uLi4iCn0KCiMgSW5zcGVjdCB0aGUgZW52aXJvbm1lbnQgdmFyaWFibGVzIHRvIGFzc2VydCB0aGUgbWluaW11bSBjb25maWd1cmF0aW9uIGlzIGluZm9ybWVkLgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsKCkgewogICAgW1sgLXogIiR7UEFSQU1TX1VSTH0iIF1dICYmCiAgICAgICAgZmFpbCAiUGFyYW1ldGVyIFVSTCBpcyBub3Qgc2V0ISIKCiAgICBbWyAteiAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIgXV0gJiYKICAgICAgICBmYWlsICJPdXRwdXQgV29ya3NwYWNlIGlzIG5vdCBzZXQhIgoKICAgIFtbICEgLWQgIiR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0iIF1dICYmCiAgICAgICAgZmFpbCAiT3V0cHV0IFdvcmtzcGFjZSBkaXJlY3RvcnkgJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nIG5vdCBmb3VuZCEiCgogICAgcmV0dXJuIDAKfQoKIyBDb3B5IHRoZSBmaWxlIGludG8gdGhlIGRlc3RpbmF0aW9uLCBjaGVja2luZyBpZiB0aGUgc291cmNlIGV4aXN0cy4KY29weV9vcl9mYWlsKCkgewogICAgbG9jYWwgX21vZGU9IiR7MX0iCiAgICBsb2NhbCBfc3JjPSIkezJ9IgogICAgbG9jYWwgX2RzdD0iJHszfSIKCiAgICBpZiBbWyAhIC1mICIke19zcmN9IiAmJiAhIC1kICIke19zcmN9IiBdXTsgdGhlbgogICAgICAgIGZhaWwgIlNvdXJjZSBmaWxlL2RpcmVjdG9yeSBpcyBub3QgZm91bmQgYXQgJyR7X3NyY30nIgogICAgZmkKCiAgICBpZiBbWyAtZCAiJHtfc3JjfSIgXV07IHRoZW4KICAgICAgICBjcCAtUnYgJHtfc3JjfSAke19kc3R9CiAgICAgICAgY2htb2QgLXYgJHtfbW9kZX0gJHtfZHN0fQogICAgZWxzZQogICAgICAgIGluc3RhbGwgLS12ZXJib3NlIC0tbW9kZT0ke19tb2RlfSAke19zcmN9ICR7X2RzdH0KICAgIGZpCn0KCiMgRGVsZXRlIGFueSBleGlzdGluZyBjb250ZW50cyBvZiB0aGUgcmVwbyBkaXJlY3RvcnkgaWYgaXQgZXhpc3RzLiBXZSBkb24ndCBqdXN0ICJybSAtcmYgPGRpcj4iCiMgYmVjYXVzZSBtaWdodCBiZSAiLyIgb3IgdGhlIHJvb3Qgb2YgYSBtb3VudGVkIHZvbHVtZS4KY2xlYW5fZGlyKCkgewogICAgbG9jYWwgX2Rpcj0iJHsxfSIKCiAgICBbWyAhIC1kICIke19kaXJ9IiBdXSAmJgogICAgICAgIHJldHVybiAwCgogICAgIyBEZWxldGUgbm9uLWhpZGRlbiBmaWxlcyBhbmQgZGlyZWN0b3JpZXMKICAgIHJtIC1yZnYgJHtfZGlyOj99LyoKICAgICMgRGVsZXRlIGZpbGVzIGFuZCBkaXJlY3RvcmllcyBzdGFydGluZyB3aXRoIC4gYnV0IGV4Y2x1ZGluZyAuLgogICAgcm0gLXJmdiAke19kaXJ9Ly5bIS5dKgogICAgIyBEZWxldGUgZmlsZXMgYW5kIGRpcmVjdG9yaWVzIHN0YXJ0aW5nIHdpdGggLi4gcGx1cyBhbnkgb3RoZXIgY2hhcmFjdGVyCiAgICBybSAtcmZ2ICR7X2Rpcn0vLi4/Kgp9CgojCiMgU2V0dGluZ3MKIwoKIyB3aGVuIHRoZSBrby1hcHAgZGlyZWN0b3J5IGlzIHByZXNlbnQsIG1ha2luZyBzdXJlIGl0J3MgcGFydCBvZiB0aGUgUEFUSApbWyAtZCAiL2tvLWFwcCIgXV0gJiYgZXhwb3J0IFBBVEg9IiR7UEFUSH06L2tvLWFwcCIKCiMgbWFraW5nIHRoZSBzaGVsbCB2ZXJib3NlIHdoZW4gdGhlIHBhcmFtdGVyIGlzIHNldApbWyAiJHtQQVJBTVNfVkVSQk9TRX0iID09ICJ0cnVlIiBdXSAmJiBzZXQgLXgKCnJldHVybiAw" |base64 -d >common.sh
        chmod +x "common.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIEV4cG9ydHMgcHJveHkgYW5kIGN1c3RvbSBTU0wgQ0EgY2VydGlmaWNhdHMgaW4gdGhlIGVudmlyb21lbnQgYW5kIHJ1bnMgdGhlIGdpdC1pbml0IHdpdGggZmxhZ3MKIyBiYXNlZCBvbiB0aGUgdGFzayBwYXJhbWV0ZXJzLgojCgpzZXQgLWV1Cgpzb3VyY2UgJChDRFBBVEg9IGNkIC0tICIkKGRpcm5hbWUgLS0gJHswfSkiICYmIHB3ZCkvY29tbW9uLnNoCgphc3NlcnRfcmVxdWlyZWRfY29uZmlndXJhdGlvbl9vcl9mYWlsCgojCiMgQ0EgKGBzc2wtY2EtZGlyZWN0b3J5YCBXb3Jrc3BhY2UpCiMKCmlmIFtbICIke1dPUktTUEFDRVNfU1NMX0NBX0RJUkVDVE9SWV9CT1VORH0iID09ICJ0cnVlIiAmJiAtbiAiJHtQQVJBTVNfQ1JUX0ZJTEVOQU1FfSIgXV07IHRoZW4KCXBoYXNlICJJbnNwZWN0aW5nICdzc2wtY2EtZGlyZWN0b3J5JyB3b3Jrc3BhY2UgbG9va2luZyBmb3IgJyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0nIGZpbGUiCgljcnQ9IiR7V09SS1NQQUNFU19TU0xfQ0FfRElSRUNUT1JZX1BBVEh9LyR7UEFSQU1TX0NSVF9GSUxFTkFNRX0iCglbWyAhIC1mICIke2NydH0iIF1dICYmCgkJZmFpbCAiQ1JUIGZpbGUgKFBBUkFNU19DUlRfRklMRU5BTUUpIG5vdCBmb3VuZCBhdCAnJHtjcnR9JyIKCglwaGFzZSAiRXhwb3J0aW5nIGN1c3RvbSBDQSBjZXJ0aWZpY2F0ZSAnR0lUX1NTTF9DQUlORk89JHtjcnR9JyIKCWV4cG9ydCBHSVRfU1NMX0NBSU5GTz0ke2NydH0KZmkKCiMKIyBQcm94eSBTZXR0aW5ncwojCgpwaGFzZSAiU2V0dGluZyB1cCBIVFRQX1BST1hZPScke1BBUkFNU19IVFRQX1BST1hZfSciCltbIC1uICIke1BBUkFNU19IVFRQX1BST1hZfSIgXV0gJiYgZXhwb3J0IEhUVFBfUFJPWFk9IiR7UEFSQU1TX0hUVFBfUFJPWFl9IgoKcGhhc2UgIlNldHR0aW5nIHVwIEhUVFBTX1BST1hZPScke1BBUkFNU19IVFRQU19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfSFRUUFNfUFJPWFl9IiBdXSAmJiBleHBvcnQgSFRUUFNfUFJPWFk9IiR7UEFSQU1TX0hUVFBTX1BST1hZfSIKCnBoYXNlICJTZXR0aW5nIHVwIE5PX1BST1hZPScke1BBUkFNU19OT19QUk9YWX0nIgpbWyAtbiAiJHtQQVJBTVNfTk9fUFJPWFl9IiBdXSAmJiBleHBvcnQgTk9fUFJPWFk9IiR7UEFSQU1TX05PX1BST1hZfSIKCiMKIyBHaXQgQ2xvbmUKIwoKcGhhc2UgIlNldHRpbmcgb3V0cHV0IHdvcmtzcGFjZSBhcyBzYWZlIGRpcmVjdG9yeSAoJyR7V09SS1NQQUNFU19PVVRQVVRfUEFUSH0nKSIKZ2l0IGNvbmZpZyAtLWdsb2JhbCAtLWFkZCBzYWZlLmRpcmVjdG9yeSAiJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfSIKCnBoYXNlICJDbG9uaW5nICcke1BBUkFNU19VUkx9JyBpbnRvICcke2NoZWNrb3V0X2Rpcn0nIgpzZXQgLXgKZXhlYyBnaXQtaW5pdCBcCgktdXJsPSIke1BBUkFNU19VUkx9IiBcCgktcmV2aXNpb249IiR7UEFSQU1TX1JFVklTSU9OfSIgXAoJLXJlZnNwZWM9IiR7UEFSQU1TX1JFRlNQRUN9IiBcCgktcGF0aD0iJHtjaGVja291dF9kaXJ9IiBcCgktc3NsVmVyaWZ5PSIke1BBUkFNU19TU0xfVkVSSUZZfSIgXAoJLXN1Ym1vZHVsZXM9IiR7UEFSQU1TX1NVQk1PRFVMRVN9IiBcCgktZGVwdGg9IiR7UEFSQU1TX0RFUFRIfSIgXAoJLXNwYXJzZUNoZWNrb3V0RGlyZWN0b3JpZXM9IiR7UEFSQU1TX1NQQVJTRV9DSEVDS09VVF9ESVJFQ1RPUklFU30iCg==" |base64 -d >git-clone.sh
        chmod +x "git-clone.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNldHMgdXAgdGhlIGJhc2ljIGFuZCBTU0ggYXV0aGVudGljYXRpb24gYmFzZWQgb24gaW5mb3JtZWQgd29ya3NwYWNlcywgYXMgd2VsbCBhcyBjbGVhbmluZyB1cCB0aGUKIyBwcmV2aW91cyBnaXQtY2xvbmUgc3RhbGUgZGF0YS4KIwoKc2V0IC1ldQoKc291cmNlICQoQ0RQQVRIPSBjZCAtLSAiJChkaXJuYW1lIC0tICR7MH0pIiAmJiBwd2QpL2NvbW1vbi5zaAoKYXNzZXJ0X3JlcXVpcmVkX2NvbmZpZ3VyYXRpb25fb3JfZmFpbAoKcGhhc2UgIlByZXBhcmluZyB0aGUgZmlsZXN5c3RlbSBiZWZvcmUgY2xvbmluZyB0aGUgcmVwb3NpdG9yeSIKCmlmIFtbICIke1dPUktTUEFDRVNfQkFTSUNfQVVUSF9CT1VORH0iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkNvbmZpZ3VyaW5nIEdpdCBhdXRoZW50aWNhdGlvbiB3aXRoICdiYXNpYy1hdXRoJyBXb3Jrc3BhY2UgZmlsZXMiCgoJZm9yIGYgaW4gLmdpdC1jcmVkZW50aWFscyAuZ2l0Y29uZmlnOyBkbwoJCXNyYz0iJHtXT1JLU1BBQ0VTX0JBU0lDX0FVVEhfUEFUSH0vJHtmfSIKCQlwaGFzZSAiQ29weWluZyAnJHtzcmN9JyB0byAnJHtQQVJBTVNfVVNFUl9IT01FfSciCgkJY29weV9vcl9mYWlsIDQwMCAke3NyY30gIiR7UEFSQU1TX1VTRVJfSE9NRX0vIgoJZG9uZQpmaQoKaWYgW1sgIiR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX0JPVU5EfSIgPT0gInRydWUiIF1dOyB0aGVuCglwaGFzZSAiQ29weWluZyAnLnNzaCcgZnJvbSBzc2gtZGlyZWN0b3J5IHdvcmtzcGFjZSAoJyR7V09SS1NQQUNFU19TU0hfRElSRUNUT1JZX1BBVEh9JykiCgoJZG90X3NzaD0iJHtQQVJBTVNfVVNFUl9IT01FfS8uc3NoIgoJY29weV9vcl9mYWlsIDcwMCAke1dPUktTUEFDRVNfU1NIX0RJUkVDVE9SWV9QQVRIfSAke2RvdF9zc2h9CgljaG1vZCAtUnYgNDAwICR7ZG90X3NzaH0vKgpmaQoKaWYgW1sgIiR7UEFSQU1TX0RFTEVURV9FWElTVElOR30iID09ICJ0cnVlIiBdXTsgdGhlbgoJcGhhc2UgIkRlbGV0aW5nIGFsbCBjb250ZW50cyBvZiBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgljbGVhbl9kaXIgJHtjaGVja291dF9kaXJ9IHx8IHRydWUKZmkKCmV4aXQgMA==" |base64 -d >prepare.sh
        chmod +x "prepare.sh"
        printf '%s' "IyEvdXNyL2Jpbi9lbnYgc2gKIwojIFNjYW4gdGhlIGNsb25lZCByZXBvc2l0b3J5IGluIG9yZGVyIHRvIHJlcG9ydCBkZXRhaWxzIHdyaXR0aW5nIHRoZSByZXN1bHQgZmlsZXMuCiMKCnNldCAtZXUKCnNvdXJjZSAkKENEUEFUSD0gY2QgLS0gIiQoZGlybmFtZSAtLSAkezB9KSIgJiYgcHdkKS9jb21tb24uc2gKCmFzc2VydF9yZXF1aXJlZF9jb25maWd1cmF0aW9uX29yX2ZhaWwKCnBoYXNlICJDb2xsZWN0aW5nIGNsb25lZCByZXBvc2l0b3J5IGluZm9ybWF0aW9uICgnJHtjaGVja291dF9kaXJ9JykiCgpjZCAiJHtjaGVja291dF9kaXJ9IiB8fCBmYWlsICJOb3QgYWJsZSB0byBlbnRlciBjaGVja291dC1kaXIgJyR7Y2hlY2tvdXRfZGlyfSciCgpwaGFzZSAiU2V0dGluZyBvdXRwdXQgd29ya3NwYWNlIGFzIHNhZmUgZGlyZWN0b3J5ICgnJHtXT1JLU1BBQ0VTX09VVFBVVF9QQVRIfScpIgpnaXQgY29uZmlnIC0tZ2xvYmFsIC0tYWRkIHNhZmUuZGlyZWN0b3J5ICIke1dPUktTUEFDRVNfT1VUUFVUX1BBVEh9IgoKcmVzdWx0X3NoYT0iJChnaXQgcmV2LXBhcnNlIEhFQUQpIgpyZXN1bHRfY29tbWl0dGVyX2RhdGU9IiQoZ2l0IGxvZyAtMSAtLXByZXR0eT0lY3QpIgoKcGhhc2UgIlJlcG9ydGluZyBsYXN0IGNvbW1pdCBkYXRlICcke3Jlc3VsdF9jb21taXR0ZXJfZGF0ZX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfY29tbWl0dGVyX2RhdGV9IiA+JHtSRVNVTFRTX0NPTU1JVFRFUl9EQVRFX1BBVEh9CgpwaGFzZSAiUmVwb3J0aW5nIHBhcnNlZCByZXZpc2lvbiBTSEEgJyR7cmVzdWx0X3NoYX0nIgpwcmludGYgIiVzIiAiJHtyZXN1bHRfc2hhfSIgPiR7UkVTVUxUU19DT01NSVRfUEFUSH0KCnBoYXNlICJSZXBvcnRpbmcgcmVwb3NpdG9yeSBVUkwgJyR7UEFSQU1TX1VSTH0nIgpwcmludGYgIiVzIiAiJHtQQVJBTVNfVVJMfSIgPiR7UkVTVUxUU19VUkxfUEFUSH0KCmV4aXQgMA==" |base64 -d >report.sh
        chmod +x "report.sh"
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
      workingDir: /scripts
    - command:
        - /scripts/prepare.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: prepare
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/git-clone.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: git-clone
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
        - mountPath: $(params.USER_HOME)
          name: user-home
    - command:
        - /scripts/report.sh
      computeResources: {}
      image: 'gcr.io/tekton-releases/github.com/tektoncd/pipeline/cmd/git-init:latest'
      name: report
      volumeMounts:
        - mountPath: /scripts
          name: scripts-dir
  volumes:
    - emptyDir: {}
      name: user-home
    - emptyDir: {}
      name: scripts-dir
  workspaces:
    - description: |
        The Git repository directory, data will be placed on the root of the
        Workspace, or on the relative path defined by the SUBDIRECTORY
        parameter.
      name: output
    - description: |
        A `.ssh` directory with private key, `known_hosts`, `config`, etc.
        Copied to the Git user's home before cloning the repository, in order to
        server as authentication mechanismBinding a Secret to this Workspace is
        strongly recommended over other volume types.
      name: ssh-directory
      optional: true
    - description: |
        A Workspace containing a `.gitconfig` and `.git-credentials` files.
        These will be copied to the user's home before Git commands run. All
        other files in this Workspace are ignored. It is strongly recommended to
        use `ssh-directory` over `basic-auth` whenever possible, and to bind a
        Secret to this Workspace over other volume types.
      name: basic-auth
      optional: true
    - description: |
        A Workspace containing CA certificates, this will be used by Git to
        verify the peer with when interacting with remote repositories using
        HTTPS.
      name: ssl-ca-directory
      optional: true

https://github.com/openshift/console/pull/13399

Bug OCPBUGS-25983: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1646

Story HOSTEDCP-918: e2e to run NodePool without setting SG in spec

View the Description View the linked PRs

DoD:

e2e to run NodePool without setting SG in spec

Bug OCPBUGS-1341: Node churn leaks PodNetworkConnectivityChecks

View the Description View the linked PRs

I haven't gone back to pin down all affected versions, but I wouldn't be surprised if we've had this exposure for a while. On a 4.12.0-ec.2 cluster, we have:

cluster:usage:resources:sum{resource="podnetworkconnectivitychecks.controlplane.operator.openshift.io"}

currently clocking in around 67983. I've gathered a dump with:

$ oc --as system:admin -n openshift-network-diagnostics get podnetworkconnectivitychecks.controlplane.operator.openshift.io | gzip >checks.gz

And many, many of these reference nodes which no longer exist (the cluster is aggressively autoscaled, with nodes coming and going all the time). We should fix garbage collection on this resource, to avoid consuming excessive amounts of memory in the Kube API server and etcd as they attempt to list the large resource set.

https://github.com/openshift/cluster-network-operator/pull/1649

Bug OCPBUGS-13717: The ovnver and ovsver args should be used even to infer to short versions of the RPMs to install in the sdn container images

View the Description View the linked PRs

The ovnver and ovsver args should be used even to infer to short versions of the RPMs to install in the sdn container images

https://github.com/openshift/sdn/pull/534

Bug OCPBUGS-16070: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/builder/pull/356

Bug OCPBUGS-19664: Installed Operators page crashes with "Oh no! Something went wrong." error

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-11286~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

OCP 4.13.0-0.nightly-2023-03-23-204038
ODF 4.13.0-121.stable

How reproducible:

Steps to Reproduce:

1. Installed ODF over OCP, everything was fine on the Installed Operators page.
2. Later when checked Installed Operators page, it crashed with "Oh no! Something went wrong" error.
3.

Actual results:

 Installed Operators page crashes with "Oh no! Something went wrong." error

Expected results:

 Installed Operators page shouldn't crash

Component and Stack trace logs from the console page- http://pastebin.test.redhat.com/1096522

Additional info:

https://github.com/openshift/console/pull/13187

Bug OCPBUGS-19903: kubevirt hypershift platform lacks live migration conformance test

View the Description View the linked PRs

Description of problem:

I have to create this OCPBUG in order to backport a test to the 4.14 branch.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28281

Bug OCPBUGS-25384: [4.14] Cluster Fleet Evaluation Backport to evluate PS violations

View the Description View the linked PRs

What

Backport Cluster Fleet Evaluation to 4.14.

Why

We need to evaluate the amount of clusters that would have broken customer workloads after enforcement. Backporting it back to 4.14 increases the probability dramatically to enforce it in 4.16. Otherwise it will take us way longer to gather enough data.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1600

Bug OCPBUGS-14405: assisted-service panics if pull secret contains null

View the Description View the linked PRs

Description of problem:

When the user's pull secret contains a JSON null in the "auth" or "email" keys, assisted service crashes when we attempt to create the cluster:

May 31 21:06:27 example.dev.local service[3389]: time="2023-05-31T09:06:27Z" level=error msg="Failed to registered cluster example with id 3648b06e-4745-4542-9421-78ae2e249c0d" func="github.
com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterClusterInternal.func1" file="/src/internal/bminventory/inventory.go:448" cluster_id=3648b06e-4745-4542-9421-
78ae2e249c0d go-id=162 pkg=Inventory request_id=1252f666-cf5c-4aae-9be7-7b7a579b5bf6
May 31 21:06:27 example.dev.local service[3389]: 2023/05/31 09:06:27 http: panic serving 10.116.24.118:46262: interface conversion: interface {} is nil, not string
May 31 21:06:27 example.dev.local service[3389]: goroutine 162 [running]:
May 31 21:06:27 example.dev.local service[3389]: net/http.(*conn).serve.func1()
May 31 21:06:27 example.dev.local service[3389]:         /usr/lib/golang/src/net/http/server.go:1850 +0xbf
May 31 21:06:27 example.dev.local service[3389]: panic({0x25d0000, 0xc00148d7d0})
May 31 21:06:27 example.dev.local service[3389]:         /usr/lib/golang/src/runtime/panic.go:890 +0x262
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/internal/cluster/validations.ParsePullSecret({0xc001ed0780, 0x1c6})
May 31 21:06:27 example.dev.local service[3389]:         /src/internal/cluster/validations/validations.go:106 +0x718
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/internal/cluster/validations.(*registryPullSecretValidator).ValidatePullSecret(0xc0005880c0, {0xc001ed0780?, 0x7?}, {0x29916da, 0x5})
May 31 21:06:27 example.dev.local service[3389]:         /src/internal/cluster/validations/validations.go:160 +0x54
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).ValidatePullSecret(...)
May 31 21:06:27 example.dev.local service[3389]:         /src/internal/bminventory/inventory.go:279
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterClusterInternal(0xc00112f880, {0x2fd3e20, 0xc00148cd50}, 0x0, {0xc0007c0400, 0xc0008d69a0})
May 31 21:06:27 example.dev.local service[3389]:         /src/internal/bminventory/inventory.go:564 +0x16d0
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).V2RegisterCluster(0x2fd3e20?, {0x2fd3e20?, 0xc00148cd50?}, {0xc0007c0400?, 0xc0008d69a0?})
May 31 21:06:27 example.dev.local service[3389]:         /src/internal/bminventory/inventory_v2_handlers.go:42 +0x39
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/restapi.HandlerAPI.func59({0xc0007c0400?, 0xc0008d69a0?}, {0x2390b20?, 0xc0014e0240?})
May 31 21:06:27 example.dev.local service[3389]:         /src/restapi/configure_assisted_install.go:639 +0xaf
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/restapi/operations/installer.V2RegisterClusterHandlerFunc.Handle(0xc000a9d068?, {0xc0007c0400?, 0xc0008d69a0?}, {0x2390b20?, 0xc0014e0240?})
May 31 21:06:27 example.dev.local service[3389]:         /src/restapi/operations/installer/v2_register_cluster.go:19 +0x3d
May 31 21:06:27 example.dev.local service[3389]: github.com/openshift/assisted-service/restapi/operations/installer.(*V2RegisterCluster).ServeHTTP(0xc000571470, {0x2fc7140, 0xc00034c040}, 0xc0007c0400)
May 31 21:06:27 example.dev.local service[3389]:         /src/restapi/operations/installer/v2_register_cluster.go:66 +0x298
May 31 21:06:27 example.dev.local service[3389]: github.com/go-openapi/runtime/middleware.NewOperationExecutor.func1({0x2fc7140, 0xc00034c040}, 0xc0007c0400)
May 31 21:06:27 example.dev.local service[3389]:         /src/vendor/github.com/go-openapi/runtime/middleware/operation.go:28 +0x59

Version-Release number of selected component (if applicable):

4.12.17

How reproducible:

Probably 100%

Steps to Reproduce:

1. Add to the pull secret in install-config.yaml an auth like:

        "example.com": {
          "auth": null,
          "email": null
        }

2. Generate the agent ISO as usual using "openshift-install agent create image"
3. Boot the ISO on the cluster hosts.

Actual results:

The create-cluster-and-infraenv.service fails to complete. In its log it reports:

    Failed to register cluster with assisted-service: Post \"http://10.1.1.2:8090/api/assisted-install/v2/clusters\": EOF

Expected results:

Cluster is installed.

Additional info:

This is particularly difficult to debug because users don't generally give us their pull secrets. The pull secret file in the agent-gather bundle has individual fields redacted, so it is a better guide than the install-config where the whole thing may be redacted.

https://github.com/openshift/assisted-service/pull/5267

Bug OCPBUGS-16735: oc adm inspect does not truncate files when overwriting

View the Description View the linked PRs

Description of problem:

oc adm inspect generated files sometime have the leading "---" and some time do not. This depends on the order of objects collected. This by itself is not an issue.

However this becomes an issue when combined with multiple invocations of oc adm inspect and collecting data to the same directory like must-gather does.

If an object is collected multiple times then the second time oc might overwrite the original file improperly and leave 4 bytes of the original content behind.

This is happening when not writing the "---\n" in the second invocation as this makes the content 4B shorter and the original tailing 4B are left in the file intact.

This garbage confuses YAML parsers.

Version-Release number of selected component (if applicable):

4.14 nighly as of Jul 25 and before

How reproducible:

Always

Steps to Reproduce:

Run oc adm inspect twice with different order of objects:

[msivak@x openshift-must-gather]$ oc adm inspect performanceprofile,machineconfigs,nodes --dest-dir=inspect.dual --all-namespaces
[msivak@x openshift-must-gather]$ oc adm inspect nodes --dest-dir=inspect.dual --all-namespaces


And then check the alphabetically first node yaml file - it will have garbage at the end of the file.

Actual results:

Garbage at the end of the file.

Expected results:

No garbage.

Additional info:

I believe this is caused by the lack of Truncate mode here https://github.com/openshift/oc/blob/master/pkg/cli/admin/inspect/writer.go#L54


Collecting data multiple times cannot be easily avoided when multiple collect scripts are combined with relatedObjects requested by operators.

https://github.com/openshift/oc/pull/1520

Bug OCPBUGS-18179: [4.14] ETCD backup script fails on FIPS enabled cluster

View the Description View the linked PRs

Description of problem:

etcd-backup fails with 'FIPS mode is enabled, but the required OpenSSL library is not available' on 4.13 FIPS enabled cluster

Version-Release number of selected component (if applicable):

OCP 4.13

How reproducible:

Steps to Reproduce:

1. run etcd-backup script on FIPS enabled OCP 4.13
2.
3.

Actual results:

backup script fails with

+ etcdctl snapshot save /home/core/assets/backup/snapshot_2023-08-28_125218.db
FIPS mode is enabled, but the required OpenSSL library is not available

Expected results:

successful run of etcd-backup script

Additional info:

4.13 uses RHEL9-based RHCOS while ETCD image still use RHEL8 and this could be main issue. If so, image should be rebuilt with RHEL9.

https://github.com/openshift/etcd/pull/211

Bug OCPBUGS-4959: oc-mirror error on second synchronisation with no change / diff (invalide sequence order)

View the Description View the linked PRs

Description of problem:

I get synchronization error in fully disconnected environment when i synchronize two time with the target mirror and there no change / diff between first synchronization and second. The first time synchronization works, on second synchronization there is an error and exit code -1.

This case occurs when you want synchronize your disconnected registry regularly and there is no change between two synchronization.

This case is presented hereafter:
https://docs.openshift.com/container-platform/4.11/installing/disconnected_install/installing-mirroring-disconnected.html#oc-mirror-differential-updates_installing-mirroring-disconnected

In documentation we have:

« Like this, the desired mirror content can be declared in the imageset configuration file statically while the mirror jobs are executed regularly, for example as part of a cron job. This way, the mirror can be kept up to date in an automated fashion”

The main question is how to synchronize fully disconnected registry regularly (with no change between each synchronization) without returning error.

Version-Release number of selected component (if applicable):

oc-mirror 4.11

How reproducible:

Follow https://docs.openshift.com/container-platform/4.11/installing/disconnected_install/installing-mirroring-disconnected.html#mirroring-image-set-full and synchronize two time with target mirror.

Steps to Reproduce:

1. oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
2. oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls

Actual results:

oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
Checking push permissions for quay-server.example.com Publishing image set from archive "output-dir/mirror_seq1_000000.tar" to registry "quay-server.example.com" error: error during publishing, expecting imageset with prefix mirror_seq2: invalid mirror sequence order, want 2, got 1

=> return -1

Expected results:

oc-mirror --from=output-dir/mirror_seq1_000000.tar  docker://quay-server.example.com/foo --dest-skip-tls 
...
No diff from last synchronization, nothing to do

=> return 0

Additional info:

Error is trigered in pkg/cli/mirror/sequence.go

+       default:
+               // Complete metadata checks
+               // UUID mismatch will now be seen as a new workspace.
+               klog.V(3).Info("Checking metadata sequence number")
+               currRun := current.PastMirror
+               incomingRun := incoming.PastMirror
+               if incomingRun.Sequence != (currRun.Sequence + 1) {
+                       return &ErrInvalidSequence{currRun.Sequence + 1, incomingRun.Sequence}
+               }

Error management in ./pkg/cli/mirror/mirror.go may be warning, no difference and return 0 instead of -1.

          }
        case diskToMirror:
                dir, err := o.createResultsDir()
                if err != nil {
                        return err
                }
                o.OutputDir = dir

                // Publish from disk to registry
                // this takes care of syncing the metadata to the
                // registry backends.
                mapping, err = o.Publish(ctx)
                if err != nil {
                        serr := &ErrInvalidSequence{}
                        if errors.As(err, &serr) {
                                return fmt.Errorf("error during publishing, expecting imageset with prefix mirror_seq%d: %v", serr.wantSeq, err)
                        }
                        return err
                }

https://github.com/openshift/oc-mirror/pull/605

Bug OCPBUGS-10161: Update 4.14 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/1914

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/1914

Bug OCPBUGS-19796: tokenConfig's accessTokenInactivityTimeout fields doesn't work in hypershift guest cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13829~~. The following is the description of the original issue:
—
Description of problem:

The configured accessTokenInactivityTimeout under tokenConfig in HostedCluster doesn't have any effect.
1. The value is not getting updated in oauth-openshift configmap 
2. hostedcluster allows user to set accessTokenInactivityTimeout value < 300s, where as in master cluster the value should be > 300s.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install a fresh 4.13 hypershift cluster  
2. Configure accessTokenInactivityTimeout as below:
$ oc edit hc -n clusters
...
  spec:
    configuration:
      oauth:
        identityProviders:
        ...
        tokenConfig:          
          accessTokenInactivityTimeout: 100s
...
3. Check the hcp:
$ oc get hcp -oyaml
...
        tokenConfig:           
          accessTokenInactivityTimeout: 1m40s
...

4. Login to guest cluster with testuser-1 and get the token
$ oc login https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443 -u testuser-1 -p xxxxxxx
$ TOKEN=`oc whoami -t`
$ oc login --token="$TOKEN"
WARNING: Using insecure TLS client config. Setting this option is not supported!
Logged into "https://a8890bba21c9b48d4a05096eee8d4edd-738276775c71fb8f.elb.us-east-2.amazonaws.com:6443" as "testuser-1" using the token provided.
You don't have any projects. You can try to create a new project, by running
    oc new-project <projectname>

Actual results:

1. hostedcluster will allow user to set the value < 300s for accessTokenInactivityTimeout which is not possible on master cluster.

2. The value is not updated in oauth-openshift configmap:
$ oc get cm oauth-openshift -oyaml -n clusters-hypershift-ci-25785 
...
      tokenConfig:
        accessTokenMaxAgeSeconds: 86400
        authorizeTokenMaxAgeSeconds: 300
...

3. Login doesn't fail even if the user is not active for more than the set accessTokenInactivityTimeout seconds.

Expected results:

Login fails if the user is not active within the accessTokenInactivityTimeout seconds.

https://github.com/openshift/hypershift/pull/3052

Bug OCPBUGS-21246: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/50

Bug OCPBUGS-14945: machine-config-daemon rprivate default mount propagation with `hostPath: path: /` breaks CSI driver relying on multipath

View the Description View the linked PRs

Description of problem:

`rprivate`  default mount propagation in combination with `hostPath: path: /` breaks CSI driver relying on multipath

How reproducible:

Always

Steps to Reproduce (simplified):

1. ssh to node, 
2.  mount a partition (for instance) /dev/{s,v}da2 which on CoreOs is an UEFI FAT partition
    $ sudo mount /dev/vda2 /mnt
3. start a debug pod on that node ( or any pod that does a hostPath mount of /, like the node tuning operand pod, the machine config operand, the filesystem integrity operand ) 
    $ oc debug nodes/master-2.sharedocp4upi411ovn.lab.upshift.rdu2.redhat.com
4. unmount the partition on node

5. notice the debug pod still has a reference to the filesystem
grep vda2 /proc/*/mountinfo
/proc/3687945/mountinfo:11219 10837 252:2 / /host/var/mnt rw,relatime - vfat /dev/vda2 rw,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,errors=remount-ro

6. On the node, although the mount is absent from /proc/mounts, the file system is still mounted, as shown by the dirty bit being still set on the FAT filesystem:

sudo fsck -n  /dev/vda2 
fsck from util-linux 2.32.1
fsck.fat 4.1 (2017-01-24)
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.

Expected results:

File system is unmounted in host and in container.

Additional info:

Although the steps above show the behaviour in a simple way, this becomes quite problematic when using multipath on a host mount.
We noticed in a customer environment that we cannot reschedule some pods from old node to new node using oc adm drain when these pods have a Persistent Volume mount created by the third party CSI driver block.csi.ibm.com.

The CSI driver is using multipath from CoreOS to manage multipath block devices, however the multipath daemon blocks the volume removal from the node (the multipath -f flushing calls from the CSI driver always return busy. Flushing a multiple device means removing it from the device tree in /dev in storage parlance)

multipath flush are always failing because although the multipath block device is unmounted on the host, machine-config, file integrity, node tuning pods are doing hostPath volume mounts of /, the host root filesystem.
and thus get a copy of the mounts.
Due to that mount copy the kernel sees the filesystem is still in use, although there a no file descriptors open on that filesyste, and considers it is unsafe to remove the multipath block device, and the node CSI driver cannot finish the unmount of the volume, thus blocking the container creation on another node.

We can see this mount copies by looking at /proc/<container pid>/mountinfo:

$ grep mpathes proc/*/mountinfo
proc/3295781/mountinfo:56348 52693 253:42 / /var/lib/kubelet/plugins/kubernetes.io/csi/block.csi.ibm.com/12345/globalmount rw,relatime - xfs /dev/mapper/mpathes rw,seclabel,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota

cri-o is doing this mount copy using `rprivate` mount propagation
( see https://github.com/cri-o/cri-o/blob/b098bec2d4d79bdf99c3ce89b0eeb16bfe8b5645/server/container_create_linux.go#L1030 )

the semantics of rprivate are mapped in`runc`
https://github.com/opencontainers/runc/blob/ba58ee9c3b9550c3e32b94802b0fb29761955290/libcontainer/specconv/spec_linux.go#L55
to mount flags passed to the mount(2) systemcall

MS_REC (since Linux 2.4.11)
              Used  in  conjunction  with  MS_BIND to create a recursive bind mount, and in
              conjunction with the propagation type flags to recursively change the  propa‐
              gation  type  of  all  of the mounts in a subtree.  See below for further de‐
              tails.

MS_PRIVATE
              Make this mount private.  Mount and unmount events do not propagate  into  or
              out of this mount.

the key is the MS_PRIVATE mount here. The unmounting of the multipath block device is not propagated to the mount namespace of containers, thus keeping the filesystem eternally mounted, preventing the flushing of the multipath device.

Maybe hostPath mounts should be done using `rslave` mount propagation, when we see we try to bind mount /var/lib ?
Seems cri-dockerd is doing something similar according to https://kubernetes.io/docs/concepts/storage/volumes/#mount-propagation

https://github.com/openshift/machine-config-operator/pull/3792

Bug OCPBUGS-13128: Several SNO clusters fail to install because the OLM operator is unavailable/degraded (Both operator-lifecycle-manager and operator-lifecycle-manager-packageserver)

View the Description View the linked PRs

Description of problem:

While deploy 3671 SNOs via ACM and ZTP, 19 SNO clusters failed to install because the clusterversion object complained that the cluster operator operator-lifecycle-manager is not available.

Version-Release number of selected component (if applicable):

Hub OCP 4.12.14
SNO Deployed OCP 4.13.0-rc.6
ACM - 2.8.0-DOWNSTREAM-2023-04-30-18-44-29

How reproducible:

19 out of 51 failed clusters out of 3671 total installs
~.5% of installs might experience this however it represents ~37% of all install failures

Steps to Reproduce:

1.
2.
3.

Actual results:

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion --no-headers"
vm00096 version         False   True   15h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm00334 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm00593 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01095 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01192 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available                                 
vm01447 version         False   True   18h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01566 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01707 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01742 version         False   True   15h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01798 version         False   True   13h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm01810 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02020 version         False   True   19h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02091 version         False   True   20h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02363 version         False   True   13h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02590 version         False   True   20h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm02908 version         False   True   18h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03253 version         False   True   14h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03500 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available
vm03654 version         False   True   17h   Unable to apply 4.13.0-rc.6: the cluster operator operator-lifecycle-manager is not available

Expected results:

Additional info:

There appears to be two distinguishing failure signatures in the list of cluster operators, every cluster shows that the OLM isn't available and is degraded and more than half of the clusters show no information regarding the operator-lifecycle-manager-packageserver.

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co operator-lifecycle-manager --no-headers"
vm00096 operator-lifecycle-manager         False   True   True   15h   
vm00334 operator-lifecycle-manager         False   True   True   19h   
vm00593 operator-lifecycle-manager         False   True   True   19h   
vm01095 operator-lifecycle-manager         False   True   True   19h   
vm01192 operator-lifecycle-manager         False   True   True   19h   
vm01447 operator-lifecycle-manager         False   True   True   18h   
vm01566 operator-lifecycle-manager         False   True   True   19h   
vm01707 operator-lifecycle-manager         False   True   True   17h   
vm01742 operator-lifecycle-manager         False   True   True   15h   
vm01798 operator-lifecycle-manager         False   True   True   13h   
vm01810 operator-lifecycle-manager         False   True   True   19h   
vm02020 operator-lifecycle-manager         False   True   True   19h   
vm02091 operator-lifecycle-manager         False   True   True   20h   
vm02363 operator-lifecycle-manager         False   True   True   13h   
vm02590 operator-lifecycle-manager         False   True   True   20h   
vm02908 operator-lifecycle-manager         False   True   True   18h   
vm03253 operator-lifecycle-manager         False   True   True   14h   
vm03500 operator-lifecycle-manager         False   True   True   17h   
vm03654 operator-lifecycle-manager         False   True   True   17h
# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo -n '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co operator-lifecycle-manager-packageserver --no-headers"
vm00096 operator-lifecycle-manager-packageserver                                 
vm00334 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm00593 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm01095 operator-lifecycle-manager-packageserver                                 
vm01192 operator-lifecycle-manager-packageserver                                 
vm01447 operator-lifecycle-manager-packageserver                                 
vm01566 operator-lifecycle-manager-packageserver         False   True   False   19h   
vm01707 operator-lifecycle-manager-packageserver                                 
vm01742 operator-lifecycle-manager-packageserver         False   True   False   15h   
vm01798 operator-lifecycle-manager-packageserver                                 
vm01810 operator-lifecycle-manager-packageserver                                 
vm02020 operator-lifecycle-manager-packageserver                                 
vm02091 operator-lifecycle-manager-packageserver         False   True   False   20h   
vm02363 operator-lifecycle-manager-packageserver         False   True   False   13h   
vm02590 operator-lifecycle-manager-packageserver         False   True   False   20h   
vm02908 operator-lifecycle-manager-packageserver         False   True   False   18h   
vm03253 operator-lifecycle-manager-packageserver                                 
vm03500 operator-lifecycle-manager-packageserver                                 
vm03654 operator-lifecycle-manager-packageserver

Viewing the pods in the openshift-operator-lifecycle-manager for these clusters shows no packageserver pod:

# cat cluster-install-failures | grep OLM | awk '{print $1}' | xargs -I % sh -c "echo '% '; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get po -n openshift-operator-lifecycle-manager"
vm00096
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-9rm9j         1/1     Running     1 (15h ago)   15h
collect-profiles-28053720-kbsdn          0/1     Completed   0             33m
collect-profiles-28053735-dzkf8          0/1     Completed   0             18m
collect-profiles-28053750-skvcn          0/1     Completed   0             3m1s
olm-operator-66658fffbb-gj294            1/1     Running     0             15h
package-server-manager-654759688-bxnwj   1/1     Running     0             15h
vm00334
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-xcw9r         1/1     Running     1 (19h ago)   19h
collect-profiles-28053720-ppq6x          0/1     Completed   0             32m
collect-profiles-28053735-r2rvw          0/1     Completed   0             18m
collect-profiles-28053750-lgb4r          0/1     Completed   0             3m2s
olm-operator-66658fffbb-t4nxg            1/1     Running     0             19h
package-server-manager-654759688-6n7gp   1/1     Running     0             19h
vm00593
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-rwfwp         1/1     Running     1 (19h ago)   19h
collect-profiles-28053720-7p6tq          0/1     Completed   0             33m
collect-profiles-28053735-nqzn9          0/1     Completed   0             18m
collect-profiles-28053750-zppm6          0/1     Completed   0             3m2s
olm-operator-66658fffbb-4gcpv            1/1     Running     0             19h
package-server-manager-654759688-rbjdw   1/1     Running     0             19h
vm01095
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2tp6j         1/1     Running     0          19h
collect-profiles-28053720-bnrfz          0/1     Completed   0          33m
collect-profiles-28053735-p8bl5          0/1     Completed   0          18m
collect-profiles-28053750-mg9nv          0/1     Completed   0          3m2s
olm-operator-66658fffbb-cb95l            1/1     Running     0          19h
package-server-manager-654759688-2mqdm   1/1     Running     0          19h
vm01192
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2crgg         1/1     Running     0          19h
collect-profiles-28053720-2rknm          0/1     Completed   0          33m
collect-profiles-28053735-wc5dn          0/1     Completed   0          18m
collect-profiles-28053750-g5bhj          0/1     Completed   0          3m2s
olm-operator-66658fffbb-5hlh4            1/1     Running     0          19h
package-server-manager-654759688-xfp24   1/1     Running     0          19h
vm01447
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-p8gd4         1/1     Running     0             18h
collect-profiles-28053720-kjw4w          0/1     Completed   0             33m
collect-profiles-28053735-k7xxp          0/1     Completed   0             17m
collect-profiles-28053750-fn5gq          0/1     Completed   0             3m3s
olm-operator-66658fffbb-rshjq            1/1     Running     1 (18h ago)   18h
package-server-manager-654759688-hrmfd   1/1     Running     0             18h
vm01566
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-gbrnj         1/1     Running     0             19h
collect-profiles-28053720-2wdcp          0/1     Completed   0             33m
collect-profiles-28053735-t7x5b          0/1     Completed   0             18m
collect-profiles-28053750-wdmtt          0/1     Completed   0             3m3s
olm-operator-66658fffbb-fsxrx            1/1     Running     0             19h
package-server-manager-654759688-4mdz8   1/1     Running     1 (19h ago)   19h
vm01707
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-f2ns6         1/1     Running     0          17h
collect-profiles-28053720-72sjt          0/1     Completed   0          33m
collect-profiles-28053735-qzgx4          0/1     Completed   0          18m
collect-profiles-28053750-mrpbl          0/1     Completed   0          3m3s
olm-operator-66658fffbb-jwp2l            1/1     Running     0          17h
package-server-manager-654759688-f7bm4   1/1     Running     0          17h
vm01742
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-lhv6f         1/1     Running     1 (15h ago)   15h
collect-profiles-28053720-4kqtf          0/1     Completed   0             33m
collect-profiles-28053735-hw7kp          0/1     Completed   0             18m
collect-profiles-28053750-6ztq2          0/1     Completed   0             3m4s
olm-operator-66658fffbb-5sqlc            1/1     Running     0             15h
package-server-manager-654759688-n6sms   1/1     Running     0             15h
vm01798
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-kx7nx         1/1     Running     2 (13h ago)   13h
collect-profiles-28053720-7vlqq          0/1     Completed   0             33m
collect-profiles-28053735-m8ltn          0/1     Completed   0             18m
collect-profiles-28053750-hrfnk          0/1     Completed   0             3m4s
olm-operator-66658fffbb-5z74m            1/1     Running     1 (13h ago)   13h
package-server-manager-654759688-6jbnz   1/1     Running     0             13h
vm01810
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-v5vr6         1/1     Running     2 (19h ago)   19h
collect-profiles-28053720-m26dn          0/1     Completed   0             33m
collect-profiles-28053735-64j7f          0/1     Completed   0             18m
collect-profiles-28053750-qf69b          0/1     Completed   0             3m4s
olm-operator-66658fffbb-gxt2b            1/1     Running     0             19h
package-server-manager-654759688-dz6p6   1/1     Running     0             19h
vm02020
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-2qqk6         1/1     Running     0          19h
collect-profiles-28053720-5cktx          0/1     Completed   0          33m
collect-profiles-28053735-ls6n9          0/1     Completed   0          18m
collect-profiles-28053750-bj6gl          0/1     Completed   0          3m4s
olm-operator-66658fffbb-zsr4g            1/1     Running     0          19h
package-server-manager-654759688-2dnfd   1/1     Running     0          19h
vm02091
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-whftg         1/1     Running     1 (20h ago)   20h
collect-profiles-28053720-zqcbs          0/1     Completed   0             33m
collect-profiles-28053735-v8lf5          0/1     Completed   0             18m
collect-profiles-28053750-rshdd          0/1     Completed   0             3m5s
olm-operator-66658fffbb-876ps            1/1     Running     0             20h
package-server-manager-654759688-smc8q   1/1     Running     0             20h
vm02363
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-zgn5m         1/1     Running     1 (13h ago)   13h
collect-profiles-28053720-dpkqq          0/1     Completed   0             33m
collect-profiles-28053735-nfqmf          0/1     Completed   0             18m
collect-profiles-28053750-jfhdz          0/1     Completed   0             3m5s
olm-operator-66658fffbb-bbrgb            1/1     Running     1 (13h ago)   13h
package-server-manager-654759688-7pv96   1/1     Running     0             13h
vm02590
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-v9mvc         1/1     Running     2 (20h ago)   20h
collect-profiles-28053720-pfcbd          0/1     Completed   0             33m
collect-profiles-28053735-5dxbl          0/1     Completed   0             18m
collect-profiles-28053750-95f6g          0/1     Completed   0             3m5s
olm-operator-66658fffbb-5knlj            1/1     Running     0             20h
package-server-manager-654759688-7qkgb   1/1     Running     0             20h
vm02908
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-cnmjf         1/1     Running     0             18h
collect-profiles-28053720-ks6h7          0/1     Completed   0             33m
collect-profiles-28053735-r682b          0/1     Completed   0             18m
collect-profiles-28053750-9jrx4          0/1     Completed   0             3m5s
olm-operator-66658fffbb-7bd2v            1/1     Running     1 (18h ago)   18h
package-server-manager-654759688-5r6gq   1/1     Running     0             18h
vm03253
NAME                                     READY   STATUS      RESTARTS      AGE
catalog-operator-94b8bfddc-8wtgg         1/1     Running     2 (14h ago)   14h
collect-profiles-28053720-kwcgk          0/1     Completed   0             33m
collect-profiles-28053735-dv5hx          0/1     Completed   0             18m
collect-profiles-28053750-8xbmw          0/1     Completed   0             3m6s
olm-operator-66658fffbb-f2n9f            1/1     Running     0             14h
package-server-manager-654759688-tjlc9   1/1     Running     0             14h
vm03500
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-wdq9b         1/1     Running     0          17h
collect-profiles-28053720-jcmwf          0/1     Completed   0          33m
collect-profiles-28053735-tjw5j          0/1     Completed   0          18m
collect-profiles-28053750-5mjq9          0/1     Completed   0          3m6s
olm-operator-66658fffbb-q92bg            1/1     Running     0          17h
package-server-manager-654759688-2z656   1/1     Running     0          17h
vm03654
NAME                                     READY   STATUS      RESTARTS   AGE
catalog-operator-94b8bfddc-vq9wt         1/1     Running     0          17h
collect-profiles-28053720-dlknz          0/1     Completed   0          33m
collect-profiles-28053735-mshs7          0/1     Completed   0          18m
collect-profiles-28053750-86xrc          0/1     Completed   0          3m6s
olm-operator-66658fffbb-5qd99            1/1     Running     0          17h

https://github.com/openshift/operator-framework-olm/pull/502

Bug OCPBUGS-13696: Warn about CBT enabled VMs via vsphere-problem-detector

View the Description View the linked PRs

We should check if CBT is enabled in cluster's nodes on vSphere platform.

1. Perform a full sweep and log each node which has CBT enabled.
2. Create an alert if some VMs have CBT enabled and other don't.
3. Alert should not be emitted if all VMs in cluster are uniformly CBT enabled.

This will avoid issues like - https://issues.redhat.com/browse/OCPBUGS-12249?filter=12399251

Bug OCPBUGS-13960: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13253~~.

https://github.com/openshift/installer/pull/7247

Bug OCPBUGS-18034: Don't update lastTransitionTime unless condition status changes

View the Description View the linked PRs

Description of problem:

Fixed by @wking, opening bug for Jira linking.

The cluster-dns-operator sets the status condition's lastTransitionTime whenever the status (true, false, unknown), reason, or message changed on a condition.  

It should only set the lastTransitionTime if the condition status changes. Otherwise this can have an affect on status flapping between true and false.  See https://github.com/openshift/api/blob/master/config/v1/types_cluster_operator.go#L129

Version-Release number of selected component (if applicable):

4.15 and earlier

How reproducible:

100%

Steps to Reproduce:

1. Put cluster-dns-operator in a Degraded condition by stopping a pod, notice the lastTransitionTime
2. Wait 1 second and stop another pod, which only updates the condition message

Actual results:

Notice the lastTransitionTime for the Degraded condition changes when the message changes, even though the status is still Degraded=true

Expected results:

The lastTransitionTime should not change unless the Degraded status changes, not the message or reason.

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/375

Bug OCPBUGS-8086: Visual issues with listing items

View the Description View the linked PRs

Remove list bullets

Need space between "Phase" and status icon

https://github.com/openshift/console/pull/12619

Bug OCPBUGS-14089: Check permission and accessibility of non-default SCs on vSphere platform for CSI

View the Description View the linked PRs

Apart from default SC, we should check if non-default SCs that were created on vSphere platform use datastore for which OCP has accessibility and necessary permissions.

This will avoid hard to debug errors in cases where customer creates additional SC but forgets to give necessary permission to newer datastore.

https://github.com/openshift/vsphere-problem-detector/pull/119

Bug OCPBUGS-1662: mcd_update_state metric should have a single time-series per node

View the Description View the linked PRs

Seen in build02, currently running 4.12.0-ec.3:

mcd_update_state{node="build0-gstfj-m-0.c.openshift-ci-build-farm.internal"}

returns:

Those are identical, except:

The first has config populated with rendered-... and has a non-zero value.
The second has config empty and has a zero value.

Looking at the backing code, my guess is that we're doing something like this:

Things are happy; export with a populated config.
Things get sad. Export with an empty config and a new error. But the happy time-series sticks around, and somehow has the value move to zero.
Things get happy again; and we return to setting a value for the happy time-series. But the sad time-series sticks around, and somehow has the value move to zero.

Or something like that. I expect we want to drop the zero-valued time-series, but I'm not clear enough on how the MCO pushes values into the export set to have code suggestions.

https://github.com/openshift/machine-config-operator/pull/3571

Story TRT-1244: Suspected regression in upgrade times on aws ovn minor upgrades

View the Description View the linked PRs

See this component readiness page.

test=[sig-cluster-lifecycle] cluster upgrade should complete in 105.00 minutes

Appears to indicate we're now taking longer than 105 minutes about 7% of the time, previously never.

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1694547497553699

wking points out it may be a one time ovn IC thing. Find out what's up and route to appropriate team.

https://github.com/openshift/origin/pull/28265

Bug WRKLDS-665: [sig-scheduling] Investigate failing test: SchedulerPreemption [Serial] validates pod disruption condition is added to the preempted pod [Suite:openshift/conformance/serial]

View the Description View the linked PRs

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-origin-27694-nightly-4.13-e2e-aws-sdn-serial/1624973266693132288

this is a new test being added in 1.26, we'll be getting that after https://github.com/openshift/origin/pull/27694 merges

https://github.com/openshift/origin/pull/27874

Task MGMT-13586: Assisted installer should wait for etcd to remove bootstrap

View the Description View the linked PRs

Currently assisted installer doesn't verify that etcd is ok before reboot on the bootstrap node as wait_for_ceo in bootkube does nothing.

In 4.13 and backported to 4.12 etcd team had added status that we can check in assisted installer in order to decide if it is safe to reboot bootstrap or not. We should check it before running shutdown command.

Eran Cohen Rom Freiman

https://github.com/openshift/assisted-installer/pull/670

Bug OCPBUGS-12720: Update 4.14 hypershift image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/hypershift/pull/2467

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/hypershift/pull/2467

Bug OCPBUGS-15999: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-13960~~.

https://github.com/openshift/installer/pull/7310

Bug OCPBUGS-22826: backport image-registry allow both ICSP IDMS

View the Description View the linked PRs

Description of problem:

This bug is for OCPNODE-1800 backport

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/image-registry/pull/385

Bug OCPBUGS-27515: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/659

Bug OCPBUGS-7516: CPMS create two replace machines when deleting a master machine on vSphere

View the Description View the linked PRs

Description of problem:

CPMS create two replace machines when deleting a master machine on vSphere.

Sorry, I have to revisit this https://issues.redhat.com/browse/OCPBUGS-4297 as I see all the related pr are merged, but I met twice on this template cluster
ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci, once on ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster today

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-13-235211

How reproducible:

Three times

Steps to Reproduce:

1. On this template cluster
ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci, the first time I met this is after update all the 3 master machines using RollingUpdate strategy, then I delete a master machine. But seems the redundant machine was automatically deleted, because there was only one replacement machine when I revisit it.

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-djlxv-2   Running                          47m
huliu-vs15b-75tr7-master-h76sp-1   Running                          58m
huliu-vs15b-75tr7-master-wtzb7-0   Running                          70m
huliu-vs15b-75tr7-worker-gzsp9     Running                          4h43m
huliu-vs15b-75tr7-worker-vcqqh     Running                          4h43m
winworker-4cltm                    Running                          4h19m
winworker-qd4c4                    Running                          4h19m
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15b-75tr7-master-djlxv-2
machine.machine.openshift.io "huliu-vs15b-75tr7-master-djlxv-2" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-bzd4h-2   Provisioning                          34s
huliu-vs15b-75tr7-master-djlxv-2   Deleting                              48m
huliu-vs15b-75tr7-master-gzhlk-2   Provisioning                          35s
huliu-vs15b-75tr7-master-h76sp-1   Running                               59m
huliu-vs15b-75tr7-master-wtzb7-0   Running                               70m
huliu-vs15b-75tr7-worker-gzsp9     Running                               4h44m
huliu-vs15b-75tr7-worker-vcqqh     Running                               4h44m
winworker-4cltm                    Running                               4h20m
winworker-qd4c4                    Running                               4h20m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-bzd4h-2   Running                          38m
huliu-vs15b-75tr7-master-h76sp-1   Running                          97m
huliu-vs15b-75tr7-master-wtzb7-0   Running                          108m
huliu-vs15b-75tr7-worker-gzsp9     Running                          5h22m
huliu-vs15b-75tr7-worker-vcqqh     Running                          5h22m
winworker-4cltm                    Running                          4h57m
winworker-qd4c4                    Running                          4h57m 

2.Then I change the strategy to OnDelete, and after update all the 3 master machines using OnDelete strategy, then I delete a master machine. 

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-hzhgq-0   Running                          137m
huliu-vs15b-75tr7-master-kj9zf-2   Running                          89m
huliu-vs15b-75tr7-master-kz6cx-1   Running                          59m
huliu-vs15b-75tr7-worker-gzsp9     Running                          7h46m
huliu-vs15b-75tr7-worker-vcqqh     Running                          7h46m
winworker-4cltm                    Running                          7h21m
winworker-qd4c4                    Running                          7h21m
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15b-75tr7-master-hzhgq-0
machine.machine.openshift.io "huliu-vs15b-75tr7-master-hzhgq-0" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-hzhgq-0   Deleting                              138m
huliu-vs15b-75tr7-master-kb687-0   Provisioning                          26s
huliu-vs15b-75tr7-master-kj9zf-2   Running                               90m
huliu-vs15b-75tr7-master-kz6cx-1   Running                               60m
huliu-vs15b-75tr7-master-qn6kq-0   Provisioning                          26s
huliu-vs15b-75tr7-worker-gzsp9     Running                               7h47m
huliu-vs15b-75tr7-worker-vcqqh     Running                               7h47m
winworker-4cltm                    Running                               7h22m
winworker-qd4c4                    Running                               7h22m
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15b-75tr7-master-kb687-0   Running                          154m
huliu-vs15b-75tr7-master-kj9zf-2   Running                          4h5m
huliu-vs15b-75tr7-master-kz6cx-1   Running                          3h34m
huliu-vs15b-75tr7-master-qn6kq-0   Running                          154m
huliu-vs15b-75tr7-worker-gzsp9     Running                          10h
huliu-vs15b-75tr7-worker-vcqqh     Running                          10h
winworker-4cltm                    Running                          9h
winworker-qd4c4                    Running                          9h
liuhuali@Lius-MacBook-Pro huali-test % oc get co     
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      5h13m   
baremetal                                  4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cloud-controller-manager                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cloud-credential                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
cluster-autoscaler                         4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
config-operator                            4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
console                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      145m    
control-plane-machine-set                  4.13.0-0.nightly-2023-02-13-235211   True        False         True       10h     Observed 1 updated machine(s) in excess for index 0
csi-snapshot-controller                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
dns                                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
etcd                                       4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
image-registry                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
ingress                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
insights                                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-apiserver                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-controller-manager                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-scheduler                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
kube-storage-version-migrator              4.13.0-0.nightly-2023-02-13-235211   True        False         False      6h18m   
machine-api                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
machine-approver                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
machine-config                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h59m   
marketplace                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
monitoring                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
network                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
node-tuning                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
openshift-apiserver                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      145m    
openshift-controller-manager               4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
openshift-samples                          4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-02-13-235211   True        False         False      6h7m    
service-ca                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      10h     
storage                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h57m   
liuhuali@Lius-MacBook-Pro huali-test %  

3.On ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster, 
after update all the 3 master machines using RollingUpdate strategy, no issue,
then delete a master machine, no issue, 
then change the strategy to OnDelete, and replace the master machines one by one, when I delete the last one, two replace machines created.

liuhuali@Lius-MacBook-Pro huali-test % oc get co 
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      73m     
baremetal                                  4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cloud-controller-manager                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cloud-credential                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
cluster-autoscaler                         4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
config-operator                            4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
console                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      129m    
control-plane-machine-set                  4.13.0-0.nightly-2023-02-13-235211   True        True          False      9h      Observed 1 replica(s) in need of update
csi-snapshot-controller                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
dns                                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
etcd                                       4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
image-registry                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
ingress                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
insights                                   4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
kube-apiserver                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-controller-manager                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-scheduler                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
kube-storage-version-migrator              4.13.0-0.nightly-2023-02-13-235211   True        False         False      3h22m   
machine-api                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
machine-approver                           4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
machine-config                             4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
marketplace                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
monitoring                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      8h      
network                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
node-tuning                                4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-apiserver                        4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-controller-manager               4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
openshift-samples                          4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager-catalog         4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
operator-lifecycle-manager-packageserver   4.13.0-0.nightly-2023-02-13-235211   True        False         False      46m     
service-ca                                 4.13.0-0.nightly-2023-02-13-235211   True        False         False      9h      
storage                                    4.13.0-0.nightly-2023-02-13-235211   True        False         False      77m    
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                          84m
huliu-vs15a-kjm6h-master-ppc55-2   Running                          3h4m
huliu-vs15a-kjm6h-master-rqb52-0   Running                          53m
huliu-vs15a-kjm6h-worker-6nbz7     Running                          9h
huliu-vs15a-kjm6h-worker-g84xg     Running                          9h
liuhuali@Lius-MacBook-Pro huali-test % oc delete machine huliu-vs15a-kjm6h-master-ppc55-2
machine.machine.openshift.io "huliu-vs15a-kjm6h-master-ppc55-2" deleted
^C
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE          TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                               85m
huliu-vs15a-kjm6h-master-cvwzz-2   Provisioning                          27s
huliu-vs15a-kjm6h-master-ppc55-2   Deleting                              3h5m
huliu-vs15a-kjm6h-master-qp9m5-2   Provisioning                          27s
huliu-vs15a-kjm6h-master-rqb52-0   Running                               54m
huliu-vs15a-kjm6h-worker-6nbz7     Running                               9h
huliu-vs15a-kjm6h-worker-g84xg     Running                               9h liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                               PHASE     TYPE   REGION   ZONE   AGE
huliu-vs15a-kjm6h-master-55s4l-1   Running                          163m
huliu-vs15a-kjm6h-master-cvwzz-2   Running                          79m
huliu-vs15a-kjm6h-master-qp9m5-2   Running                          79m
huliu-vs15a-kjm6h-master-rqb52-0   Running                          133m
huliu-vs15a-kjm6h-worker-6nbz7     Running                          10h
huliu-vs15a-kjm6h-worker-g84xg     Running                          10h
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

CPMS create two replace machines when deleting a master machine, and the two replace machines exist there for a long time

Expected results:

CPMS should only create one replace machine when deleting a master machine, or quickly delete the redundant machine

Additional info:

Must-gather: https://drive.google.com/file/d/1aCyFn9okNxRz7nE3Yt_8g6Kx7sPSGCg2/view?usp=sharing for ipi-on-vsphere/versioned-installer-vmc7-ovn-winc-thin_pvc-ci template cluster
https://drive.google.com/file/d/1i0fWSP0-HqfdV5E0wcNevognLUQKecvl/view?usp=sharing for ipi-on-vsphere/versioned-installer-vmc7-ovn template cluster

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/207

Story HOSTEDCP-951: Let Install apply to continue and aggregate errors

View the Description View the linked PRs

DoD:

Currently we return early if we fail to apply a resource during installation https://github.com/openshift/hypershift/blob/main/cmd/install/install.go#L248

There's no reason why we wouldn't keep going, aggregate errors and return at the end.

It might help for scenarios where one broken CR prevent everything else from being installed, e.g.

https://redhat-internal.slack.com/archives/C02LM9FABFW/p1680599409023509?thread_ts=1680589848.540709&cid=C02LM9FABFW

https://github.com/openshift/hypershift/pull/2372

Bug MGMT-14526: Possible issue with validateNoWildcardDNS resolution validation

View the Description View the linked PRs

This is a ticket created based off a GitHub comment from a random user

Description of the problem:

See GitHub comment

How reproducible:

Unknown

Steps to reproduce:

1. See GitHub comment

Actual results:

DNS wildcard validation failure is a false-postiive

Expected results:

DNS wildcard validation should probably avoid domain-search

Bug OCPBUGS-12082: Update 4.14 ose-machine-api-provider-aws image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-aws/pull/64

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-aws/pull/67

Bug OCPBUGS-13133: Update 4.14 ose-vsphere-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-vsphere/pull/37

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-vsphere/pull/37

Bug OCPBUGS-14665: Helm Chart installation screen fails to render if JSON schema contains remote $refs

View the Description View the linked PRs

Description of problem:

In Helm Charts we define a values.schema.json file - a JSON schema for all the possible values the user can set in a chart. This schema needs to follow JSON schema standard. The standard includes something called $ref - a reference to the either local or remote definition. If we use a schema with remote references in OCP, it causes various troubles in OCP. Different OCP versions gives different results, also on the same OCP version you can get different results based on how tight down the cluster networking is.

Prerequisites (if any, like setup, operators/versions):

Tried in Developer Sandbox, OpenShift Local, Baremetal Public Cluster in Operate First, OCP provisioned through clusterbot. It behaves differently in each instance. Individual cases are described below.

Steps to Reproduce

1. Go to the "Helm" tab in Developer Perspective
2. Click "Create" in top right and select "Repository"
3. Use following ProjectHelmChartRepository resource and click "Create" (this repo contains single chart, this chart has values.schema.json with content linked below):

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: https://raw.githubusercontent.com/tumido/helm-backstage/reproducer

4. Go back the "Helm" tab in Developer Perspective
5. Click "Create" in top right and select "Helm Release"
6. In filters section of the catalog in the "Chart repositories" select "Reproducer"
7. Click on the single tile available (Backstage)
8. Click "Install Helm Chart"
9. Either you will be greeted with various error screens or you see the "YAML view" tab (this tab selection is not the default and is remembered during user session only I suppose)
10. Select "Form view"

Actual results:

Various error screens depending on OCP version and network restrictions. I've attached screen captures how it behaves in different settings.

Expected results:

Either render the form view (resolve remote references) or make it obvious that remote references are not supporter. Optionally fallback to the "YAML view" regarding that user doesn't have full schema available, but the chart is still deployable.

Reproducibility (Always/Intermittent/Only Once):

Depends on the environment
Always in OpenShift Local, Developer Sandbox, cluster bot clusters

Build Details:

Workaround:

1. Select any other chart to install, click "Install Helm Chart"
2. Change the view to "YAML view"
3. Go back to the Helm catalog without actually deploying anything
4. Select the faulty chart and click "Install Helm Chart"
5. Proceed with installation

Additional info:

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-26500: Remove CRI-O-update-triggered image wipe

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25228~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-24743. The following is the description of the original issue:
—

Description of problem:

Since many 4.y ago, before 4.11 and all the minor versions that are still supported, CRI-O has wiped images when it comes up after a node reboot and notices it has a new (minor?) version. This causes redundant pulls, as seen in this 4.11-to-4.12 update run:

$ curl -s  https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-upgrade-from-stable-4.11-e2e-azure-sdn-upgrade/1732741139229839360/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes/ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4/journal | zgrep 'Starting update from rendered-\|crio-wipe\|Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2'
Dec 07 13:05:42.474144 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 13:05:42.481470 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 191ms CPU time
Dec 07 13:59:51.000686 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1498]: time="2023-12-07 13:59:51.000591203Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=a62bc972-67d7-401a-9640-884430bd16f1 name=/runtime.v1.ImageService/PullImage
Dec 07 14:00:55.745095 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 root[101294]: machine-config-daemon[99469]: Starting update from rendered-worker-ca36a33a83d49b43ed000fd422e09838 to rendered-worker-c0b3b4eadfe6cdfb595b97fa293a9204: &{osUpdate:true kargs:false fips:false passwd:false files:true units:true kernelType:false extensions:false}
Dec 07 14:05:33.274241 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Succeeded.
Dec 07 14:05:33.289605 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 systemd[1]: crio-wipe.service: Consumed 216ms CPU time
Dec 07 14:14:50.277011 ci-op-rzrplpjd-7f65d-vwrzs-worker-eastus21-lcgk4 crio[1573]: time="2023-12-07 14:14:50.276961087Z" level=info msg="Pulled image: registry.ci.openshift.org/ocp/4.12-2023-12-07-060628@sha256:3c3e67faf4b6e9e95bebb0462bd61c964170893cb991b5c4de47340a2f295dc2" id=1a092fbd-7ffa-475a-b0b7-0ab115dbe173 name=/runtime.v1.ImageService/PullImage

The redundant pulls cost network and disk traffic, and avoiding them should make those update-initiated reboots quicker and cheaper. The lack of update-initiated wipes is not expected to cost much, because the Kubelet's old-image garbage collection should be along to clear out any no-longer-used images if disk space gets tight.

Version-Release number of selected component (if applicable):

At least 4.11. Possibly older 4.y; I haven't checked.

How reproducible:

Every time.

Steps to Reproduce:

1. Install a cluster.
2. Update to a release image with a different CRI-O (minor?) version.
3. Check logs on the nodes.

Actual results:

crio-wipe entries in the logs, with reports of target-release images being pulled before and after those wipes, as I quoted in the Description.

Expected results:

Target-release images pulled before the reboot, and found in the local cache if that image is needed again post-reboot.

https://github.com/openshift/machine-config-operator/pull/4105

Bug OCPBUGS-8305: IPI on Power VS clusters cannot deploy MCO

View the Description View the linked PRs

Description of problem:

machine-config-operator will fail on clusters deployed with IPI on Power Virtual Server with the following error:

Cluster not available for []: ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: spec.infra.status.platformStatus.powervs.resourceGroup: Invalid value: "": spec.infra.status.platformStatus.powervs.resourceGroup in body should match '^[a-zA-Z0-9-_

Version-Release number of selected component (if applicable):

4.14 and 4.13

How reproducible:

100%

Steps to Reproduce:

1. Deploy with openshift-installer to Power VS
2. Wait for masters to start deploying
3. Error will appear for the machine-config CO

Actual results:

MCO fails

Expected results:

MCO should come up

Additional info:

Fix has been identified

https://github.com/openshift/installer/pull/6928

Bug OCPBUGS-8691: Operands running management side missing affinity, tolerations, node selector and priority rules than the operator

View the Description View the linked PRs

Description of problem:

In hypershift context:
Operands managed by Operators running in the hosted control plane namespace in the management cluster do not honour affinity opinions https://hypershift-docs.netlify.app/how-to/distribute-hosted-cluster-workloads/
https://github.com/openshift/hypershift/blob/main/support/config/deployment.go#L263-L265

These operands running management side should honour the same affinity, tolerations, node selector and priority rules than the operator.
This could be done by looking at the operator deployment itself or at the HCP resource.

aws-ebs-csi-driver-controller
aws-ebs-csi-driver-operator
csi-snapshot-controller
csi-snapshot-webhook

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create a hypershift cluster.
2. Check affinity rules and node selector of the operands above.
3.

Actual results:

Operands missing affinity rules and node selecto

Expected results:

Operands have same affinity rules and node selector than the operator

Additional info:

Bug OCPBUGS-10173: Update 4.14 oauth-server image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oauth-server/pull/119

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oauth-server/pull/134

Bug OCPBUGS-14489: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/139

Bug OCPBUGS-23212: The InstallPlan has two duplicate items in the clusterServiceVersionNames array, which causes duplicate items to displayed on multiple pages in the console.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17408~~. The following is the description of the original issue:
—
Description of problem:

An operator installPlan has duplicate key values for installPlan?.spec.clusterServiceVersionNames which is displayed in multiple pages in the management console.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

Always

Expected results:

In the screenshots linked below the clusterServiceVersionNames value should only display one item, but because their are duplicate key values it lists it twice.

Additional info:

This bug causes duplicate values to be shown in several pages of the Management Console. screenshots
https://drive.google.com/file/d/1OwiLXU8iETNusCf6N2AhB5y-ykXwgyBU/view?usp=drive_link

https://drive.google.com/file/d/1qfMso1x-s--samU7OmDKU-3NVfxqsxWD/view?usp=drive_link

https://drive.google.com/file/d/1Z9mGRllp4ZLN2OlSNKZY2QTIDx8QpyVS/view?usp=drive_link

https://drive.google.com/file/d/1CYWMpKy_KmUV_KfIxCjS1FAWHYbYA6rw/view?usp=drive_link

https://github.com/openshift/operator-framework-olm/pull/607

Bug OCPBUGS-24432: [release-4.14] String filter on events page doesn't work well

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18401~~. The following is the description of the original issue:
—
Description of problem:

Go to Home -> Events page, type string in filter field, the events are not filtered. (The search mode is fuzzy search by default)

Version-Release number of selected component (if applicable):

 4.14.0-0.nightly-2023-08-28-154013

How reproducible:

Always

Steps to Reproduce:

1.Go to Home -> Events page, type string in filter field,
2.
3.

Actual results:

1. The events are not filtered.

Expected results:

1. Should filter out events containing the filter string.

Additional info:

Type filter could work on events page.

https://github.com/openshift/console/pull/13413

Bug OCPBUGS-8113: fails to switch to kernel-rt with rhel 9.2

View the Description View the linked PRs

This came up a while ago, see https://groups.google.com/u/1/a/redhat.com/g/aos-devel/c/HuOTwtI4a9I/m/nX9mKjeqAAAJ

Basically this MC:

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: worker-override
spec:
  kernelType: realtime
  osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b4cc3995d5fc11e3b22140d8f2f91f78834e86a210325cbf0525a62725f8e099

Will degrade the node with

E0301 21:25:09.234001    3306 writer.go:200] Marking Degraded due to: error running rpm-ostree override remove kernel kernel-core kernel-modules kernel-modules-extra --install kernel-rt-core --install kernel-rt-modules --install kernel-rt-modules-extra --install kernel-rt-kvm: error: Could not depsolve transaction; 1 problem detected:
 Problem: package kernel-modules-core-5.14.0-282.el9.x86_64 requires kernel-uname-r = 5.14.0-282.el9.x86_64, but none of the providers can be installed
  - conflicting requests
: exit status 1

It's kind of annoying here because the packages to remove are now OS version dependent. A while ago I filed https://github.com/coreos/rpm-ostree/issues/2542 which would push the problem down into rpm-ostree, which is in a better situation to deal with it, and that may be the fix...but it's also pushing the problem down there in a way that's going to be maintenance pain (but, we can deal with that).

It's also possible that we may need to explicitly request installation of `kernel-rt-modules-core`...I'll look.

Bug OCPBUGS-14581: Windows support is not enabled in vsphere CSI FSS ConfigMap

View the Description View the linked PRs

Description of problem:

In order for Windows nodes to use the openshift-cluster-csi-drivers/internal-feature-states.csi.vsphere.vmware.com ConfigMap, which contains the configuration for vSphere CSI, `csi-windows-support` must be set to true.
This is documented here: https://github.com/kubernetes-sigs/vsphere-csi-driver/blob/833421f42475809b4f76ea125095b5120af0f8e1/docs/book/features/csi_driver_on_windows.md#how-to-enable-vsphere-csi-with-windows-nodes

Without this, a separate ConfigMap must be created and used for a user deploying Windows vSphere CSI drivers.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Add a Windows node to the cluster
2. Deploy vsphere csi daemonset for windows nodes as documented upstream
3. Add a Windows pod with a pvc mount

Actual results:

The pod is unable to mount the volume as windows support is not enabled

Expected results:

The pod can mount the volume

Additional info:

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/158

Bug OCPBUGS-19821: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/813

Bug OCPBUGS-20250: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/77

Bug MGMT-13955: [BE] - Creating a cluster with skip validation - cluster fails on prepare-for-installation phase but no error/warning in events

View the Description View the linked PRs

Description of the problem:

In BE 2.16.0 - try to install new cluster with enabled ignore-validation {"host-validation-ids": "[\"all\"]", "cluster-validation-ids": "[\"all\"]"} - one host with less HD space (18GB). Installation starts, but after 20 minutes waiting, cluster is back to draft status without any event

How reproducible:

100%

Steps to reproduce:

1. Create new multi cluster - configure one of the hosts to have 18GB HD (minimum req is 20GB)

2. Enable ignore-validations by:

curl -X 'PUT' \
  'http://api.openshift.com/api/assisted-install/v2/clusters/eaffbd37-2a0b-42b2-a706-ad5b23ff17a3/ignored-validations' \
  --header "Authorization: Bearer $(ocm token)" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "ignored_host_validations": "[\"all\"]",
  "ignored_cluster_validations": "[\"all\"]"
}'

3. start installation. cluster is stuck on prepare-for-installation for 20 minutes and then moves to draft with no event about the reason

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5158

Bug OCPBUGS-14270: techpreview jobs are failing due to new gathering pods

View the Description View the linked PRs

Description of problem:

Techpreview parallel jobs are failing due to changes in the insights operator

Example failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview/1663408887002304512

Looks like it's from https://github.com/openshift/insights-operator/pull/764

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/analysis?filters=%7B%22items%22%3A%5B%7B%22id%22%3A0%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-techpreview%22%7D%2C%7B%22id%22%3A1%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview%22%7D%2C%7B%22id%22%3A2%2C%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.14-e2e-vsphere-ovn-techpreview%22%7D%5D%2C%22linkOperator%22%3A%22or%22%7D

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/insights-operator/pull/785

Bug OCPBUGS-16807: ccoctl does not error when OIDC and installation resource groups are the same

View the Description View the linked PRs

Description of problem:

ccoctl does not prevent the user from using the same resource group name for the OIDC and installation resource groups which can result in resources existing in the resource group used for cluster installation. The OpenShift installer requires that the installation resource group be empty so OIDC and installation resource groups must be distinct.

ccoctl currently allows for providing either --oidc-resource-group-name and --installation-resource-group name but does not indicate a problem when those resource group names are the same. When the same resource group name is provided using a combination of the --name, --oidc-resource-group-name and --installation-resource-group-name parameters, ccoctl should exit with an error indicating that the resource group names must be different.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Run ccoctl azure create-all with a combination of --name, --oidc-resource-group-name or --installation-resource-group-name resulting in OIDC and installation resource group names being the same.

./ccoctl azure create-all --name "abutchertest" --region centralus --subscription-id "${SUBSCRIPTION_ID}"--credentials-requests-dir "${MYDIR}/credreqs" --oidc-resource-group-name test "abutchertest" --dnszone-resource-group-name "${DNS_RESOURCE_GROUP}"

ccoctl will default the installation resource group to match the provided --name parameter "abutchertest" which results in OIDC and installation resource groups being "abutchertest" since --oidc-resource-group uses the same name. This means that OIDC resources will be created in the resource group that will be configured for the OpenShift installer within the install-config.yaml.

2. Run the OpenShift installer having set .platform.azure.resourceGroupName in the install-config.yaml to be "abutchertest" and receive error that the installation resource group is not empty when running the installer. The resource identified will contain user-assigned managed identities meant to be created in the OIDC resource group which must be separate from the installation resource group.

FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": platform.azure.resourceGroupName: Invalid value: "abutchertest": resource group must be empty but it has 8 resources like...

Actual results:

ccoctl allows OIDC and installation resource group names to be the same.

Expected results:

ccoctl does not allow OIDC and installation resource groups to be the same.

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/582

Bug OCPBUGS-10767: [AWS] installer get stuck if BYO private hosted zone is configured

View the Description View the linked PRs

Description of problem:

Installer get stuck at the beginning of installation if BYO private hosted zone is configured in install-config, from the CI logs, installer has no actions in 2 hours.

Errors:
level=info msg=Credentials loaded from the "default" profile in file "/var/run/secrets/ci.openshift.io/cluster-profile/.awscred"
185
{"component":"entrypoint","file":"k8s.io/test-infra/prow/entrypoint/run.go:164","func":"k8s.io/test-infra/prow/entrypoint.Options.ExecuteProcess","level":"error","msg":"Process did not finish before 2h0m0s timeout","severity":"error","time":"2023-03-05T16:44:27Z"}

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-23-000343

How reproducible:

Always

Steps to Reproduce:

1. Create an install-config.yaml, and config byo private hosted zone
2. Create the cluster

Actual results:

installer showed the following message and then get stuck, the cluster can not be created.

level=info msg=Credentials loaded from the "default" profile in file "/var/run/secrets/ci.openshift.io/cluster-profile/.awscred"

Expected results:

create cluster successfully

Additional info:

https://github.com/openshift/installer/pull/7070

Bug OCPBUGS-14185: Alert Rules do not have summary/description

View the Description View the linked PRs

Description of problem:

Alert Rules do not have summary/description

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

Check details of following Alert Rules
1. KubeletHealthState
2. MCCDrainError
3. MCDPivotError
4. MCDRebootError
5. SystemMemoryExceedsReservation

Actual results:

These Alert Rules do not have Summary/Description annotation, but have a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation

Expected results:

Alerts should have Summary/Description annotation.

Additional info:

Alerts must have a summary/description annotation, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 


To resolve the bug, 
- Rename message annotation to summary/description annotation
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27944

https://github.com/openshift/machine-config-operator/pull/3721

Bug OCPBUGS-27178: logSizeMax automatically applied to containerRuntimeConfig even if not specified

View the Description View the linked PRs

Description of problem:

According to https://docs.openshift.com/container-platform/4.11/release_notes/ocp-4-11-release-notes.html#ocp-4-11-deprecated-features-crio-parameters and Red Hat Insights, logSizeMax is deprecated in ContainerRuntimeConfig and shall instead be created via containerLogMaxSize in KubeletConfig.

When starting that transition though, it was noticed that a ContainerRuntimeConfig as shown below, would still add logSizeMax and even overlaySize to the ContainerRuntimeConfig spec.

$ bat /tmp/crio.yaml 
apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
 name: pidlimit
spec:
 machineConfigPoolSelector:
   matchLabels:
     pools.operator.machineconfiguration.openshift.io/worker: '' 
 containerRuntimeConfig:
   pidsLimit: 4096 
   logLevel: debug

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

When checking on the OpenShift Container Platform 4 - Node, using crio coonfig, we can see that the values are not applied. Yet it's disturbing to see those options added in the specification when in fact Red Hat is recommending to move them into KubeletConfig and remove them from ContainerRuntimeConfig.

Further, having them still set in ContainerRuntimeConfig will trigger a false/positive alert in Red Hat Insights as generally the customer may have followed the recommendation but the system does not comply with the changes made :-)

Also interesting , similar problem was reported a while ago in https://bugzilla.redhat.com/show_bug.cgi?id=1941936 and fixed. Hence it's interesting that this is coming back again.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Install OpenShift Container Platform 4.13.4
2. Create ContainerRuntimeConfig as shown above and validate the actual object created
3. Run oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig' to validate the object created and inspect the spec.

Actual results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "logSizeMax": "0",
  "overlaySize": "0",
  "pidsLimit": 4096
}

Expected results:

$ oc get containerruntimeconfig  pidlimit -o json | jq '.spec.containerRuntimeConfig'
{
  "logLevel": "debug",
  "pidsLimit": 4096
}

Additional info:

https://github.com/openshift/machine-config-operator/pull/4121

Story TRT-1092: Hypershift CI Failures image-registry is not available

View the Description View the linked PRs

Observing CI Hypershift failures in 4.14.0-0.ci-2023-06-16-074926

Payload includes image-registry/pull/370 which is the current suspected source of the regression

https://github.com/openshift/image-registry/pull/371

Bug OCPBUGS-14434: Running `yarn dev` results in the build running on a loop

View the Description View the linked PRs

Description of problem:

Running `yarn dev` results in the build running on a loop.  This issue appears to be related to changes in https://github.com/openshift/console/pull/12821.

How reproducible:

Always

Steps to Reproduce:

1. Run `yarn dev`
2. Make changes to a file and save
3. Watch the terminal output of `yarn dev` and note the build is looping

https://github.com/openshift/console/pull/12990

Bug OCPBUGS-16035: Upgrade to 4.13.4 stuck for MCO degraded because of nmstatectl: exit status 1

View the Description View the linked PRs

Description of problem:

This issue was supposed to be fixed in 4.13.4 but is happening again. Manually creating the directory "/etc/systemd/network" allow to complete the upgrade but is not a sustainable workaround when there are several cluster to update.

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

At customer environment.

Steps to Reproduce:

1. Update to 4.13.4 from 4.12.21
2.
3.

Actual results:

MCO degraded blocking the upgrade.

Expected results:

Upgrade to complete.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3883

Bug OCPBUGS-22758: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-20357~~.

https://github.com/openshift/installer/pull/7655

Bug OCPBUGS-9072: Metal Day-1 When No Hostname is Provided by Either rDNS or DHCP, All Hosts are Named "localhost".

View the Description View the linked PRs

Platform:

IPI on Baremetal

What happened?

In cases where no hostname is provided, host are automatically assigned the name "localhost" or "localhost.localdomain".

[kni@provisionhost-0-0 ~]$ oc get nodes
NAME STATUS ROLES AGE VERSION
localhost.localdomain Ready master 31m v1.22.1+6859754
master-0-1 Ready master 39m v1.22.1+6859754
master-0-2 Ready master 39m v1.22.1+6859754
worker-0-0 Ready worker 12m v1.22.1+6859754
worker-0-1 Ready worker 12m v1.22.1+6859754

What did you expect to happen?

Having all hosts come up as localhost is the worst possible user experience, because they'll fail to form a cluster but you won't know why.

However, we know the BMH name in the image-customization-controller, it would be possible to configure the ignition to set a default hostname if we don't have one from DHCP/DNS.

If not, we should at least fail the installation with a specific error message to this situation.

----------
30/01/22 - adding how to reproduce
----------

How to Reproduce:

1)prepare and installation with day-1 static ip.

add to install-config uner one of the nodes:
networkConfig:
routes:
config:

destination: 0.0.0.0/0
next-hop-address: 192.168.123.1
next-hop-interface: enp0s4
dns-resolver:
config:
server:
192.168.123.1
interfaces:
name: enp0s4
type: ethernet
state: up
ipv4:
address:
ip: 192.168.123.110
prefix-length: 24
enabled: true

2)Ensure a DNS PTR for the address IS NOT configured.

3)create manifests and cluster from install-config.yaml

installation should either:
1)fail as early as possible, and provide some sort of feed back as to the fact that no hostname was provided.
2)derive the Hostname from the bmh or the ignition files

Bug OCPBUGS-14614: 4.14 Metal IPv6 Installs are worse than 4.13

View the Description View the linked PRs

Description of problem:


TRT has identified a likely regression in Metal IPv6 installations.  4.14 installs are statistically worse than 4.13. We are working on a new tool called Component Readiness that does cross-release comparisons to ensure nothing get worse. I think it has found something in metal.

At GA, 4.13 metal installs for ipv6 upgrade micro jobs were 100%.  They are now around 89% in 4.14.  All the failures seem to have the same mode where no workers come up, with PXE errors in the serial console.  

 !image-2023-06-06-10-13-13-310.png|thumbnail! 

You can view the report here:

https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-18%2000%3A00%3A00&capability=Other&component=Installer%20%2F%20openshift-installer&confidence=95&environment=ovn%20upgrade-micro%20amd64%20metal-ipi%20standard&excludeArches=arm64&excludeClouds=alibaba%2Cibmcloud%2Clibvirt%2Covirt&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=metal-ipi&sampleEndTime=2023-06-06%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-05-09%2000%3A00%3A00&testId=cluster%20install%3A0cb1bb27e418491b1ffdacab58c5c8c0&testName=install%20should%20succeed%3A%20overall&upgrade=upgrade-micro&variant=standard

The serial console on the workers shows PXE errors:

>>Start PXE over IPv4.
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0001 "UEFI PXEv4 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0): Not Found

>>Start PXE over IPv6..
  Station IP address is FD00:1101:0:0:2EE1:8456:96FB:68B1
  Server IP address is FD00:1101:0:0:0:0:0:3
  NBP filename is snponly.efi
  NBP filesize is 0 Bytes
  PXE-E18: Server response timeout.
BdsDxe: failed to load Boot0002 "UEFI PXEv6 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000): Not Found

>>Start HTTP Boot over IPv4.
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Server response timeout.
BdsDxe: failed to load Boot0003 "UEFI HTTPv4 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(): Not Found

>>Start HTTP Boot over IPv6..
  Error: Could not retrieve NBP file size from HTTP server.

  Error: Remote boot cancelled.
BdsDxe: failed to load Boot0004 "UEFI HTTPv6 (MAC:00962801D023)" from PciRoot(0x0)/Pci(0x2,0x0)/Pci(0x0,0x0)/MAC(00962801D023,0x1)/IPv6(0000:0000:0000:0000:0000:0000:0000:0000,0x0,Static,0000:0000:0000:0000:0000:0000:0000:0000,0x40,0000:0000:0000:0000:0000:0000:0000:0000)/Uri(): Not Found
BdsDxe: No bootable option or device was found.
BdsDxe: Press any key to enter the Boot Manager Menu.

Version-Release number of selected component (if applicable):


4.14

How reproducible:

10%

Steps to Reproduce:

1. 
2.
3.

Actual results:

Expected results:

Additional info:


Example failures:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1665428719952465920

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1664711616538611712

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1664645418744549376

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-upgrade-ovn-ipv6/1663915360878858240

https://github.com/openshift/ironic-static-ip-manager/pull/39

Bug OCPBUGS-15419: Title on Overview page has changed to "Cluster · Red Hat OpenShift"

View the Description View the linked PRs

Description of problem:

The title on Overview page has changed to "Cluster · Red Hat OpenShift" instead of "Overview · Red Hat OpenShift" that we had starting from 4.11.

Version-Release number of selected component (if applicable):

OCP 4.14

How reproducible:

Install OpenShift 4.14, login to management console and navigate to Home / Overview

Steps to Reproduce:

1. Install OpenShift 4.14 
2. login to management console 
3. Navigate to Home / Overview 
4. Load the HTML DOM and verify the HTML node <title>; title is also visible when hovering on the opened tab in Chrome or Firefox

Actual results:

Cluster · Red Hat OpenShift

HTML node: <title data-telemetry="Cluster" data-react-helmet="data-telemetry" xpath="1">Cluster · Red Hat OpenShift</title>

Expected results:

Overview · Red Hat OpenShift

Additional info:

started from 4.11 the title on that page was always Overview · Red Hat OpenShift. UI tests rely on consistent titles to detect currently opened web page. 

* It is important to notice the change has an effect on accessibility, since it is a common accessibility feature to navigate with the text speech.

https://github.com/openshift/console/pull/12951

Bug OCPBUGS-10190: Update 4.14 ose-machine-api-provider-azure image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-azure/pull/53

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-azure/pull/53

Bug OCPBUGS-5469: Risk cache warming takes too long on channel changes

View the Description View the linked PRs

Description of problem:

When changing channels it's possible that multiple new conditional update risks will need to be evaluated. For instance, a cluster running 4.10.34 in a 4.10 channel today only has to evaluate `OpenStackNodeCreationFails` but when the channel is changed to a 4.11 channel multiple new risks require evaluation and the evaluation of new risks is throttled at one every 10 minutes. This means if there are three new risks it may take up to 30 minutes after the channel has changed for the full set of conditional updates to be computed. This leads to a perception that no update paths are recommended because most will not wait 30 minutes, they expect immediate feedback.

Version-Release number of selected component (if applicable):

4.10.z, 4.11.z, 4.12, 4.13

How reproducible:

100%

Steps to Reproduce:

1. Install 4.10.34
2. Switch from stable-4.10 to stable-4.11
3.

Actual results:

Observe no recommended updates for 10-20 minutes because all available paths to 4.11 have a risk associated with them

Expected results:

Risks are computed in a timely manner for an interactive UX, lets say < 10s

Additional info:

This was intentional in the design, we didn't want risks to continuously re-evaluate or overwhelm the monitoring stack, however we didn't anticipate that we'd have long standing pile of risks and realize how confusing the user experience would be.

We intend to work around this in the deployed fleet by converting older risks from `type: promql` to `type: Always` avoiding the evaluation period but preserving the notification. While this may lead customers to believe they're exposed to a risk they may not be, as long as the set of outstanding risks to the latest version is limited to no more than one it's likely no one will notice. All 4.10 and 4.11 clusters currently have a clear path toward relatively recent 4.10.z or 4.11.z with no more than one risk to be evaluated.

https://github.com/openshift/cluster-version-operator/pull/909

Bug OCPBUGS-7989: ControlPlaneMachineSet: Machine's Node should be Ready to consider the Machine Ready

View the Description View the linked PRs

Description of problem:

ControlPlaneMachineSet Machines are considered Ready once the underlying MAPI machine is Running.
This should not be a sufficient condition, as the Node linked to that Machine should also be Ready for the overall CPMS Machine to be considered Ready.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/171

Bug OCPBUGS-11039: --container-runtime is being removed in k8s 1.27

View the Description View the linked PRs

Kubernetes 1.27 removes long deprecated --container-runtime flag, see https://github.com/kubernetes/kubernetes/pull/114017

To ensure the upgrade path between 4.13 to 4.14 isn't affected we need to backport the changes to both 4.14 and 4.13.

https://github.com/openshift/installer/pull/7036

Bug OCPBUGS-11219: Print preview of Topology UI List view presents incorrect layout

View the Description View the linked PRs

Description of problem:

Print preview of Topology presents incorrect layout

Version-Release number of selected component (if applicable):

4.12.0

How reproducible:

Always

Steps to Reproduce:

1. Have 2 KNative/Serverless Functions deployed (in my case 1 is Quarkus and another is Spring Boot)
2. In Topology UI observe you see their snippets properly within Graph view are
3. Now switch to List view.
4. In my case items I see in List view are such short list of my items:
Broker
  default
Operator Backed Service
DW terminal-avby87
  D workspaceb5975d64dbc54983
Service
KSVC caller-function
  REV caller-function-00002
Service
KSVC callme-function
  REV callme-function-00001
5. Now using Chrome browser click Ctrl+P, i.e. Print preview
6. Observe that even in Landscape mode only till workspace item is displayed and no more pages/info.

Actual results:

Incomplete Topology info from List view in Print Preview

Expected results:

Full and accurate Topology info from List view in Print Preview

Additional info:

Bug OCPBUGS-21001: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/548

Bug MGMT-13431: [Assisted-4.12][Staging] ODF Storage class not recognizing all additional storage device sets

View the Description View the linked PRs

Description of the problem:

While having a cluster with 3 masters and attaching 5 additional disks , on the 3 masters , checking the device storage sets for the operator show only 3 storage devices and not as expected the 5 additional disks

How reproducible:

80%,

OCP 4.12, OCS 4.12.1

also reproduces on OCP 4.11

Steps to reproduce:

1. Create a Cluster with 3 master nodes

2. attach 2 additional disks to master1 , 2 additional disks to master 2 , 1 additional disk to master 3

3. check count of storage devices on operator

Actual results:
operator show device set count = 3

Expected results:
device set count should be as the amount of the different valid additional attached disks (= 5)

https://github.com/openshift/assisted-service/pull/5268

Task MGMT-15243: Make sure that empty manifests are not applied

View the Description View the linked PRs

It is possible, due to the way that the UI is currently implemented, that a user may be able to submit a manifest with no content.
We need to filter manifests before they are applied to ensure that any manifests that are empty (lack at least one key/value) are not applied.

A good suggested location to look at might be

https://github.com/openshift/assisted-service/blob/master/internal/ignition/ignition.go#L402-L409

https://github.com/openshift/assisted-service/pull/5355

Bug OCPBUGS-19737: Faster risk cache warming

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19512~~. The following is the description of the original issue:
—
~~OCPBUGS-5469~~ and backports began prioritizing later target releases, but we still wait 10m between different PromQL evaluations while evaluating conditional update risks. This ticket is tracking work to speed up cache warming, and allows changes that are too invasive to be worth backporting.

Definition of done:

When presented with new risks, the CVO will initially evaluate one PromQL expression every second or so, instead of waiting 10m between different evaluations. Each PromQL expression will still only be evaluated once every hour or so, to avoid excessive load on the PromQL engine.

Acceptance Criteria:

After changing the channel and receiving a new graph conditional risks are evaluated as quickly as possible, ideally less than 500ms per unique risk

https://github.com/openshift/cluster-version-operator/pull/973

Bug OCPBUGS-20740: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/107

Task MON-3274: Request for sending data via telemetry (API Streaming feature)

View the Description View the linked PRs

Request for sending data via telemetry

The goal is to collect metrics about the number of LIST and WATCH requests to the apiserver because it will allow to measure the deployment progress of the API streaming feature. The new feature will replace the use of LIST requests with WATCH.

apiserver_list_watch_request_total:rate:sum

apiserver_list_watch_request_total:rate:sum represents the rate of change for the LIST and WATCH requests over a 5 minute period.

Labels

code, all possible values are: 200 (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#success-codes)

verb, possible values are: LIST, WATCH

The cardinality of the metric is at most 2.

https://github.com/openshift/cluster-monitoring-operator/pull/2044

Bug OCPBUGS-16793: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/33

Bug OCPBUGS-21190: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/307

Bug OCPBUGS-21795: Wrong Additional Trusted Bundle Name Reconciled to the CPO

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19419~~. The following is the description of the original issue:
—
Description of problem:

The wrong additional trusted bundle name is reconciled to the CPO deployment in the HCC - https://github.com/openshift/hypershift/blob/369e0f18212b658e0bd6ebe3b9f6f387008ec5bd/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L1291. A user most likely would never run into this using the CLI to create the cluster because it defaults the right name - user-ca-bundle, https://github.com/openshift/hypershift/blob/5ec788802880d550f2164a2210cf90e8d0f32225/api/fixtures/example.go#L501

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1. Create a ConfigMap with a name other than user-ca-bundle for the value to go into the additional-trust-bundle field on a HC yaml spec
2. Deploy the HC yaml spec
3. Observe the CPO fail to deploy

Actual results:

CPO fails to deploy

Expected results:

CPO should deploy

Additional info:

On https://github.com/openshift/hypershift/blob/369e0f18212b658e0bd6ebe3b9f6f387008ec5bd/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L1291, we should pass the name we reconciled here (i.e. user-ca-bundle) https://github.com/openshift/hypershift/blob/248df10f8606696cd284521efccc31471445aa63/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L1755

https://github.com/openshift/hypershift/pull/3102

Bug OCPBUGS-22285: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13273

Story TRT-856: Write test to detect overlap between DNS lookup and real disruption

View the Description View the linked PRs

Related to ~~TRT-849~~, we want to write a test to see how often this is happening before we undertake a major effort to get to the bottom of it.

The test will need to process disruption across all backends, look for DNS lookup disruptions, and then see if we have overlap with non-DNS lookup disruptions within those timeframes.

We have some precedent for similar code in KubePodNotReady alerts that we handle differently if in proximity to other intervals.

The test should flake, we can then see how often it's happening in sippy and on what platforms. With sql we could likely pinpoint to certain build clusters as well.

https://github.com/openshift/origin/pull/27826

Bug OCPBUGS-21088: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-6354: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/175

Bug OCPBUGS-16770: olm-collect-profiles cronjob pods can't reach mgmt KAS

View the Description View the linked PRs

After https://issues.redhat.com//browse/HOSTEDCP-1062, the `olm-collect-profiles` CronJob pods did not get NeedManagementKASAccessLabel label and thus fail

# oc logs olm-collect-profiles-28171952-2v8gn
Error: Get "https://172.29.0.1:443/api?timeout=32s": dial tcp 172.29.0.1:443: i/o timeout

https://github.com/openshift/hypershift/pull/2854

Bug OCPBUGS-20408: [4.14] CPMSO: Unsupported GCP e2-custom-* instance type in E2E test framework

View the Description View the linked PRs

Description of problem:

GCP e2-custom-* instance type is not supported by our E2E test framework.
Now that testplatform have started using those instance types, we are seeing permafailing E2E job runs on our CPMS E2E periodic tests.

Error sample:

• [FAILED] [285.539 seconds]475ControlPlaneMachineSet Operator With an active ControlPlaneMachineSet and the instance type is changed [BeforeEach] should perform a rolling update [Periodic]476  [BeforeEach] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:39477  [It] /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/test/e2e/periodic_test.go:43478479  [FAILED] provider spec should be updated with bigger instance size480  Expected success, but got an error:481      <*fmt.wrapError | 0xc000358380>: 482      failed to get next instance size: instance type did not match expected format: e2-custom-6-16384483      {484          msg: "failed to get next instance size: instance type did not match expected format: e2-custom-6-16384",485          err: <*fmt.wrapError | 0xc000358360>{486              msg: "instance type did not match expected format: e2-custom-6-16384",487              err: <*errors.errorString | 0xc0001489f0>{488                  s: "instance type did not match expected format",489              },490          },491      }

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Use e2-custom in GCP in a cluster, run CPMSO E2E periodics
2.
3.

Actual results:

Permafailing E2Es

Expected results:

Successful E2Es

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/247

Bug OCPBUGS-20870: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/65

Bug OCPBUGS-24627: ignition-server-proxy deployment fails on y-stream upgrade 4.13->4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23472~~. The following is the description of the original issue:
—
ignition-server-proxy pods fail to start after y-stream upgrade because the deployment is configured with a ServiceAccount, set in 4.13, that was deleted in 4.14 in PR https://github.com/openshift/hypershift/pull/2778. The 4.14 reconciliation does not unset the ServiceAccount that was set in 4.13.

https://github.com/openshift/hypershift/pull/3295

Task ODC-7309: Remove left integration-tests reviewers

View the Description View the linked PRs

Remove Hemant and Sparsh from integration-tests reviewers

https://github.com/openshift/console/pull/12802

Bug OCPBUGS-1115: Extracting the cli in darwin from a multi payload leads to "filtered all images from manifest list"

View the Description View the linked PRs

Description of problem:

Extracting the cli in darwin from a multi payload leads to "filtered all images from manifest list"

Version-Release number of selected component (if applicable):

Tested with oc4.11

How reproducible:

Always on Darwin machines

Steps to Reproduce:

1.oc adm release extract --command=oc quay.io/openshift-release-dev/ocp-release:4.11.4-multi -v5

Actual results:

I0909 18:37:28.591323   37669 config.go:127] looking for config.json at /Users/lwan/.docker/config.jsonI0909 18:37:28.591601   37669 config.go:135] found valid config.json at /Users/lwan/.docker/config.jsonWarning: the default reading order of registry auth file will be changed from "${HOME}/.docker/config.json" to podman registry config locations in the future version of oc. "${HOME}/.docker/config.json" is deprecated, but can still be used for storing credentials as a fallback. See https://github.com/containers/image/blob/main/docs/containers-auth.json.5.md for the order of podman registry config locations.I0909 18:37:30.391895   37669 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-releaseI0909 18:37:30.696483   37669 client_mirrored.go:412] get manifest for sha256:53679d92dc0aea8ff6ea4b6f0351fa09ecc14ee9eda1b560deeb0923ca2290a1 served from registryclient.retryManifest{ManifestService:registryclient.manifestServiceVerifier{ManifestService:(*client.manifests)(0x14000a36330)}, repo:(*registryclient.retryRepository)(0x14000f46e80)}: <nil>I0909 18:37:30.696738   37669 manifest.go:405] Skipping image sha256:fcf4d95df9a189527453d8961a22a3906514f5ecbb05afbcd0b2cdd212aab1a2 for manifestlist.PlatformSpec{Architecture:"amd64", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.696843   37669 manifest.go:405] Skipping image sha256:1992a4713410b7363ae18b0557a7587eb9e0d734c5f0f21fb1879196f40233a3 for manifestlist.PlatformSpec{Architecture:"ppc64le", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.696869   37669 manifest.go:405] Skipping image sha256:3698082cd66e90d2b79b62d659b4e7399bfe0b86c05840a4c31d3197cdac4bfa for manifestlist.PlatformSpec{Architecture:"s390x", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.697106   37669 manifest.go:405] Skipping image sha256:15fc18c81f053cad15786e7a52dc8bff29e647ea642b3e1fabf2621953f727eb for manifestlist.PlatformSpec{Architecture:"arm64", OS:"linux", OSVersion:"", OSFeatures:[]string(nil), Variant:"", Features:[]string(nil)} from quay.io/openshift-release-dev/ocp-release:4.11.4-multiI0909 18:37:30.697570   37669 workqueue.go:143] about to send work queue error: unable to read image quay.io/openshift-release-dev/ocp-release:4.11.4-multi: filtered all images from manifest listerror: unable to read image quay.io/openshift-release-dev/ocp-release:4.11.4-multi: filtered all images from manifest list

Expected results:

The darwin/$(uname -m) cli is extracted

Additional info:

Are we re-using some function from the `oc mirror` feature to select the manifest to use? It's like it is looking for a "darwin/$(uname -m)" and filter-out all the available linux manifests.

https://github.com/openshift/oc/pull/1311

Bug OCPBUGS-21472: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/711

Bug OCPBUGS-10345: Runtime error on console backend

View the Description View the linked PRs

Description of problem:

A runtime error is encountered when running the console backend in off-cluster mode against only one cluster (non-multicluster configuration)

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Follow readme instructions for running bridge locally
2.
3.

Actual results:

Bridge crashes with a runtime error

Expected results:

Bridge should run normally

Additional info:

https://github.com/openshift/console/pull/12654

Bug OCPBUGS-10910: The network-tools image stream is missing in the cluster samples

View the Description View the linked PRs

Description of problem:

The network-tools image stream is missing in the cluster samples. It is needed for CI tests.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/495

Bug OCPBUGS-12362: structured logs are borked in BMO

View the Description View the linked PRs

An upstream partial fix to logging means that the BMO log now contains a mixture of structured and unstructured logs, making it impossible to read with the structured log parsing tool (bmo-log-parse) we use for debugging customer issues.
This is fixed upstream by https://github.com/metal3-io/baremetal-operator/pull/1249, which will get picked up automatically in 4.14 but which needs to be backported to 4.13.

https://github.com/openshift/baremetal-operator/pull/274

Bug OCPBUGS-23042: Monitor tests are failing in Local Zone jobs (edge nodes)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22703~~. The following is the description of the original issue:
—
Description of problem:

The following pre submit jobs for Local Zones are perm failing since August:
- e2e-aws-ovn-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones?buildId=1716457254460329984
- e2e-aws-ovn-shared-vpc-localzones: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-installer-master-e2e-aws-ovn-shared-vpc-localzones

Investigating we can see common failures in tests '[sig-network] can collect <poller_name> poller pod logs', leading the most of jobs to not completed correctly for those failures.

Exploring the code I can see it was recently added, near August and matches with when the failures started.

It is required to tolerate the label "node-role.kubernetes.io/edge" to run pods on instances located in Local Zone ("edge nodes"). I am not sure if I am looking in the correct place, but it seems it is tolerating only master labels: https://github.com/openshift/origin/blob/master/pkg/monitortests/network/disruptionpodnetwork/host-network-target-deployment.yaml#L42

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

always

Steps to Reproduce:

trigger the job:
1. open a PR on installer
2. run the job
3. check failed tests '[sig-network] can collect <poller_name> poller pod logs' 

Example of 4.15 blocked feature PR (Wavelength Zones): https://github.com/openshift/installer/pull/7369#issuecomment-1783699175

Actual results:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/7590/pull-ci-openshift-installer-master-e2e-aws-ovn-localzones/1715075142427611136
{  1 pods lacked sampler output: [pod-network-to-pod-network-disruption-poller-d94fb55db-9qfpz]}

E1018 22:06:34.773866       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close
E1018 22:06:34.774669       1 disruption_backend_sampler.go:496] not finished writing all samples (1 remaining), but we're told to close

Expected results:

Monitor jobs be scheduled in edge nodes?
How we can track job failures for new monitor tests?

Additional info:

Edge nodes have NoSchedule taints applied by default, to run monitor pods in those nodes you need to tolerate the label "node-role.kubernetes.io/edge"

See the enhancement for more informaation: https://github.com/openshift/enhancements/blob/master/enhancements/installer/aws-custom-edge-machineset-local-zones.md#user-workload-deployments

Looking the must-gather of job 1716457254460329984, you can see the monitor pods not scheduled due the missing tolerations:

$ grep -rni pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2 \
  1716457254460329984-must-gather/09abb0d6fc08ee340563e6e11f5ceafb42fb371e50ab6acee6764031062525b7/namespaces/openshift-kube-scheduler/pods/ \
  | awk -F'] "' '{print$2}' | sort | uniq -c
    215 Unable to schedule pod; no fit; waiting" pod="e2e-pod-network-disruption-test-59s5d/pod-network-to-pod-network-disruption-poller-7c97cd5d7-t2mn2" 
err="0/7 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/edge: }, 
6 node(s) didn't match pod anti-affinity rules. preemption: 0/7 nodes are available: 
1 Preemption is not helpful for scheduling, 6 No preemption victims found for incoming pod.."

https://github.com/openshift/origin/pull/28387

Bug OCPBUGS-17038: Port 9447 is exposed with a weak cipher and TLS 1.0/TLS 1.1

View the Description View the linked PRs

Description of problem:


Facing the same issue as JIRA[1] in OCP 4.12 and for the backport this bug solution to the OCP 4.12

JIRA[1]: https://issues.redhat.com/browse/OCPBUGS-14064

As port 9447 is exposed from the cluster in one of the control nodes and is using weak cipher and TLS 1.0/ TLS 1.1 , this is incompatible with the security standards for our product release. Either we should be able to disable this port or update the cipher and TLS version as the fix for meeting the security standards as you are aware TLS 1.0 & TLS 1.1 are pretty old and deprecated already.

we confirmed that fips were enabled during cluster deployment by passing the key-value pair in the config file."~~~
fips: true

On JIRA[1] it is suggested to open a separate Bug for backporting.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-operator/pull/293

Bug OCPBUGS-22774: There is no clear error log when create sts cluster with KMS key without install role in it

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13664~~. The following is the description of the original issue:
—
Description of problem:

There is no clear error log when create sts cluster with KMS key without install role in it

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.Prepare KMS with aws command
   aws kms create-key --tags TagKey=Purpose,TagValue=Test --description "kms Key" 2.Create sts cluster with KMS key 

rosa create cluster --cluster-name ying-k1 --sts --role-arn arn:aws:iam::301721915996:role/ying16-Installer-Role --support-role-arn arn:aws:iam::301721915996:role/ying16-Support-Role --controlplane-iam-role arn:aws:iam::301721915996:role/ying16-ControlPlane-Role --worker-iam-role arn:aws:iam::301721915996:role/ying16-Worker-Role --operator-roles-prefix ying-k1-e2g3 --oidc-config-id 23ggvdh2jouranue87r5ujskp8hctisn --region us-west-2 --version 4.12.15 --replicas 2 --compute-machine-type m5.xlarge --machine-cidr 10.0.0.0/16 --service-cidr 172.30.0.0/16 --pod-cidr 10.128.0.0/14 --host-prefix 23 --kms-key-arn arn:aws:kms:us-west-2:301721915996:key/c60b5a31-1a5c-4d73-93ee-67586d0eb90d

Actual results:

It is failed. Here is the install log 
http://pastebin.test.redhat.com/1100008

Expected results:

There should be a detailed error message for the KMS that has no installer role

Additional info:

It can be successful if set install role arn to KMS key 
  {
    "Version": "2012-10-17",
    "Id": "key-default-1",
    "Statement": [
        {
            "Sid": "Enable IAM User Permissions",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                   "arn:aws:iam::301721915996:role/ying16-Installer-Role",
                    "arn:aws:iam::301721915996:root"
                ]
            },
            "Action": "kms:*",
            "Resource": "*"
        }
    ]
}

https://github.com/openshift/installer/pull/7659

Bug OCPBUGS-22898: HostedCluster with ControlPlaneEndpoint: 443 also exposes on 6443

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20161~~. The following is the description of the original issue:
—
Description of problem:

HostedClusters with a .status.controlPlaneEndpoint.port: 443 unexepectedly also expose the KAS on port 6443. This causes four security group rules to be consumed per LoadBalancer service (443/6443 for router and 443/6443 for private-router) instead of just two (443 for router and 443 for private-router). This directly impacts the number of HostedClusters on a Management Cluster since there is a hard cap of 200 security group rules per security group.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create a HostedCluster resulting in its .status.controlPlaneEndpoint.port: 443
2. Observe that the router/private-router LoadBalancer services expose both ports 6443 and 443

Actual results:

The router/private-router LoadBalancer services expose both ports 6443 and 443

Expected results:

The router/private-router LoadBalancer services exposes only port 443

Additional info:

Bug OCPBUGS-10387: Infra is not usually labeled in capacity_cpu_core

View the Description View the linked PRs

Description of problem:

In the metric `cluster:capacity_cpu_cores:sum` there is an attribute label `label_node_role_kubernetes_io` that has `infra` or `master`. There is no label for `worker`. If the infra nodes are missing this label, they get added into the "unlabeled" worker nodes. 

For example:
This cluster has all three types `cluster:capacity_cpu_cores:sum{_id="0702a3b1-c2d8-427f-865d-3ce7dc3a2be7"}`

But this cluster has the infra and worker merged. `cluster:capacity_cpu_cores:sum{_id="0e60ac76-d61a-4e6d-a4f3-269110b6b1f9"}`


If I count clusters that have sockets with infra but capacity_cpu without infra, I get 7,617 cluster for 2023-03-15

If I count clusters that have sockets with infra but capacity_cpu with infra, I get 2,015 cluster for 2023-03-15

That means that there are 5602 clusters that are missing the infra label. 

This metric is used to identify the vCPU/CPU count that is used in TeleSense. This is presented to the Sales teams and upper management. If there is another metric we should use, please let me know. Otherwise, this needs to be fixed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

refer to Slack thread: https://redhat-internal.slack.com/archives/C0VMT03S5/p1678967355450719

https://github.com/openshift/cluster-monitoring-operator/pull/1926

Bug OCPBUGS-12559: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/80

Bug OCPBUGS-15440: CMO does not apply NodeSelector, Tolerations, TopologySpreadConstraints to monitoring-plugin deployment correctly

View the Description View the linked PRs

Description of problem:

monitoringPlugin tolerations not working

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

apply monitoringPlugin tolerations to cm `cluster-monitoring-config`

example:
...  
    monitoringPlugin:
      tolerations:
        - key: "key1"
          operator: "Equal"
          value: "value1"
          effect: "NoSchedule"

Actual results:

the cm applyed but not take effect to the deployment

Expected results:

able to see the tolerations applyed to deployment/pod

Additional info:

same condition to NodeSelector, TopologySpreadConstraints

https://github.com/openshift/cluster-monitoring-operator/pull/2018

Bug OCPBUGS-14561: Prevent ci/prow/versions from failing on PR against release-xxx

View the Description View the linked PRs

See https://issues.redhat.com//browse/MON-3173 for details.

Having the test failing may be confusing.

+ we should make the test clearer.

https://github.com/openshift/cluster-monitoring-operator/pull/1969

Bug OCPBUGS-17059: pod/importer-prime-xxx can't to be ready for HyperShift KubeVirt

View the Description View the linked PRs

Description of problem:

Unable to successfully create HyperShift KubeVirt HostedCluster on BM, control plane's pod/importer-prime-xxx can's be ready

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. HyperShift install operator
2. HyperShift create cluster KubeVirt xxx

Actual results:

➜  oc get pod -n clusters-3d9ec3c7e495f1c58da1  | grep "importer-prime"
importer-prime-90175dc9-21bf-4f13-a021-6c42a2e19652   1/2     Error              16 (5m13s ago)   57m
importer-prime-9f153661-1c2c-4b61-84fd-0a2d83f30699   1/2     Error              16 (5m4s ago)    57m
importer-prime-cb817383-58bd-4480-a7e1-49ae42368cae   1/2     CrashLoopBackOff   15 (4m51s ago)   57m

➜  oc logs importer-prime-90175dc9-21bf-4f13-a021-6c42a2e19652 -c importer -n clusters-3d9ec3c7e495f1c58da1

I0728 18:41:20.106447       1 importer.go:103] Starting importer
E0728 18:41:20.107346       1 importer.go:133] exit status 1, blockdev: cannot open /dev/cdi-block-volume: Permission denied

kubevirt.io/containerized-data-importer/pkg/util.GetAvailableSpaceBlock
        /remote-source/app/pkg/util/util.go:136
kubevirt.io/containerized-data-importer/pkg/util.GetAvailableSpaceByVolumeMode
        /remote-source/app/pkg/util/util.go:106
main.main
        /remote-source/app/cmd/cdi-importer/importer.go:131
runtime.main
        /usr/lib/golang/src/runtime/proc.go:250
runtime.goexit
        /usr/lib/golang/src/runtime/asm_amd64.s:1598
➜  oc get hostedcluster -n clusters 3d9ec3c7e495f1c58da1 -ojsonpath='{.status.version.desired}' | jq
{
  "image": "registry.build01.ci.openshift.org/ci-op-ywf2rxrx/release@sha256:940a0463d1203888fb4e5fa4a09b69dc4eb3cc5d70dee22e1155c677aafca197",
  "version": "4.14.0-0.ci-2023-07-28-090906"
}
➜  oc get hostedcluster -n clusters 3d9ec3c7e495f1c58da1                                    
NAME                   VERSION   KUBECONFIG                              PROGRESS   AVAILABLE   PROGRESSING   MESSAGE
3d9ec3c7e495f1c58da1             3d9ec3c7e495f1c58da1-admin-kubeconfig   Partial    True        False         The hosted control plane is available
➜  oc get clusterversion version -ojsonpath='{.status.desired.image}'
registry.build01.ci.openshift.org/ci-op-ywf2rxrx/release@sha256:940a0463d1203888fb4e5fa4a09b69dc4eb3cc5d70dee22e1155c677aafca197                                                       
➜  oc get vmi -A                                             
No resources found

Expected results:

All pods on the control plane should be ready

Additional info:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/41772/rehearse-41772-periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-kubevirt-baremetalds-conformance/1684954151244533760

https://github.com/openshift/hypershift/pull/2860

Bug OCPBUGS-11190: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/bond-cni/pull/48

Bug OCPBUGS-12074: Update 4.14 ose-cluster-kube-scheduler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-scheduler-operator/pull/478

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-scheduler-operator/pull/479

Bug OCPBUGS-6727: Nutanix: Hostname of the VM is not set when using DHCP network config

View the Description View the linked PRs

Description of problem:

When creating an OCP cluster with Nutanix infrastructure and using DHCP instead of IPAM network config, the Hostname of the VM is not set by DHCP. In these case we need to inject the desired hostname through cloud-init for both control-plane and worker nodes.

Version-Release number of selected component (if applicable):

How reproducible:

Reproducible when creating an OCP cluster with Nutanix infrastructure and using DHCP instead of IPAM network config.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-10504: AWSPrivateLink is not updated on conflicting entries with VPCEndpointServcieName field

View the Description View the linked PRs

Description of problem:

When you migrate a HostedCluster, the AWSEndpointService conflicts from the old MGMT Server with the new MGMT Server. The AWSPrivateLink_Controller does not have any validation when this happens. This is needed to make the Disaster Recovery HC Migration works. So the issue will raise up when the nodes of the HostedCluster cannot join the new Management cluster because the AWSEndpointServiceName is still pointing to the old one.

Version-Release number of selected component (if applicable):

4.12
4.13
4.14

How reproducible:

Follow the migration procedure from upstream documentation and the nodes in the destination HostedCluster will keep in NotReady state.

Steps to Reproduce:

1. Setup a management cluster with the 4.12-13-14/main version of the HyperShift operator.
2. Run the in-place node DR Migrate E2E test from this PR https://github.com/openshift/hypershift/pull/2138:
bin/test-e2e \
  -test.v \
  -test.timeout=2h10m \
  -test.run=TestInPlaceUpgradeNodePool \
  --e2e.aws-credentials-file=$HOME/.aws/credentials \
  --e2e.aws-region=us-west-1 \
  --e2e.aws-zones=us-west-1a \
  --e2e.pull-secret-file=$HOME/.pull-secret \
  --e2e.base-domain=www.mydomain.com \
  --e2e.latest-release-image="registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-03-17-063546" \
  --e2e.previous-release-image="registry.ci.openshift.org/ocp/release:4.13.0-0.nightly-2023-03-17-063546" \
  --e2e.skip-api-budget \
  --e2e.aws-endpoint-access=PublicAndPrivate

Actual results:

The nodes stay in NotReady state

Expected results:

The nodes should join the migrated HostedCluster

Additional info:

https://github.com/openshift/hypershift/pull/2290

Bug OCPBUGS-11749: Private router not deployed in HCP namespace on a 4.13 mgmt cluster

View the Description View the linked PRs

Description of problem:

Cluster does not finish rolling out on a 4.13 management cluster because of pod security constraints.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Install 4.14 hypershift operator on a recent 4.13 mgmt cluster
2.Create an AWS PublicAndPrivate hosted cluster on that hypershift cluster

Actual results:

Hosted cluster stalls rollout because the private router never gets created

Expected results:

Hosted cluster comes up successfully

Additional info:

Pod security enforcement is preventing the private router from getting created.

https://github.com/openshift/hypershift/pull/2415

Bug OCPBUGS-13926: OCM-o does not support obtaining verbosity through OpenShiftControllerManager.operatorLogLevel objec

View the Description View the linked PRs

Description of problem:

OCM-o does not support obtaining verbosity through OpenShiftControllerManager.operatorLogLevel object

Version-Release number of selected component (if applicable):

How reproducible:

modify the OpenShiftControllerManager.operatorLogLevel, and the OCM-o operator will not display the correspond logs

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-16245: DHCP networking is not applicable using config-image API

View the Description View the linked PRs

Description of problem:

Using agent-config.yaml with DHCP network mode (i.e. without 'hosts' property), throws this error when loading the config-image: 
load-config-iso.sh[1656]: Expected file /etc/assisted/manifests/nmstateconfig.yaml is not in archive

Version-Release number of selected component (if applicable):

4.14 (master)

How reproducible:

100%

Steps to Reproduce:

1. Create an agent-config.yaml without 'hosts' property.
2. Generate a config-image.
3. Boot the machine and mount the ISO.

Actual results:

Installation can't continue due to an error on config-iso load:
load-config-iso.sh[1656]: Expected file /etc/assisted/manifests/nmstateconfig.yaml is not in archive

Expected results:

The installation should continue as normal.

Additional info:

The issue is probably due to a fix introduced for static networking:
https://issues.redhat.com/browse/OCPBUGS-15637
I.e. since '/etc/assisted/manifests/nmstateconfig.yaml' was added to GetConfigImageFiles, it's now mandatory on load-config.iso.sh (see 'copy_archive_contents' func).

The failure was missed on dev-scripts tests probably due to this issue: https://github.com/openshift-metal3/dev-scripts/pull/1551

https://github.com/openshift/installer/pull/7333

Bug OCPBUGS-19697: [GCP 4.14] [Azure/AWS <=4.13] Pod didn't trigger arm64 machineset scale out from 0 when a required node selector term on non-amd64 nodes is set

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18137~~. The following is the description of the original issue:
—
Description of problem:

When a workload includes a node selector term on the label kubernetes.io/arch and the allowed values do not include amd64, the auto scaler does not trigger the scale out of a valid, non-amd64, machine set if its current replicas are 0 and (for 4.14+) no architecture capacity annotation is set (ref ~~MIXEDARCH-129~~).

The issue is due to https://github.com/openshift/kubernetes-autoscaler/blob/f0ceeacfca57014d07f53211a034641d52d85cfd/cluster-autoscaler/cloudprovider/utils.go#L33

This bug should be considered at first on clusters having the same architecture for the control plane and the data plane.

In the case of multi-arch compute clusters, there is probably no alternative than letting the capacity annotation to be properly set in the machine set either manually or by the cloud provider actuator, as already discussed in the ~~MIXEDARCH-129~~ works, otherwise relying to the control plane architecture.

Version-Release number of selected component (if applicable):

- ARM64 IPI on GCP 4.14
- ARM64 IPI on Aws and Azure <=4.13
- In general, non-amd64 single-arch clusters supporting autoscale from 0

How reproducible:

Always

Steps to Reproduce:

1. Create an arm64 IPI cluster on GCP
2. Set one of the machinesets to have 0 replicas: 
    oc scale -n openshift-machine-api machineset/adistefa-a1-zn8pg-worker-f
3. Deploy the default autoscaler
4. Deploy the machine autoscaler for the given machineset
5. Deploy a workload with node affinity to arm64 only nodes, large resource requests and enough number of replicas.

Actual results:

From the pod events: 

pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector

Expected results:

The cluster autoscaler scales the machineset with 0 replicas in order to provide resources for the pending pods.

Additional info:

---
apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec: {}
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: worker-us-east-1a
  namespace: openshift-machine-api
spec:
  minReplicas: 0
  maxReplicas: 12
  scaleTargetRef:
    apiVersion: machine.openshift.io/v1beta1
    kind: MachineSet
    name: adistefa-a1-zn8pg-worker-f
---
apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: openshift-machine-api
  name: 'my-deployment'
  annotations: {}
spec:
  selector:
    matchLabels:
      app: name
  replicas: 3
  template:
    metadata:
      labels:
        app: name
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                - key: kubernetes.io/arch
                  operator: In
                  values:
                    - "arm64"
      containers:
        - name: container
          image: >-
            image-registry.openshift-image-registry.svc:5000/openshift/httpd:latest
          ports:
            - containerPort: 8080
              protocol: TCP
          env: []
          resources:
              requests:
                cpu: "2"
      imagePullSecrets: []
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  paused: false

Bug OCPBUGS-8752: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug MGMT-14530: [Staging] - Events search should not be case sensitive

View the Description View the linked PRs

Description of the problem:

Events search should not be case sensitive

How reproducible:

100%

Steps to reproduce:

1. On UI View Cluster Events

2. Enter text on "Filter by text" field. (eg. "success" or "Success" )

Actual results:

Events filter is case sensitive.

See screenshots enclosed

Expected results:

Events filter should not be case sensitive

https://github.com/openshift/assisted-service/pull/5194

Bug OCPBUGS-21516: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/286

Bug OCPBUGS-22187: [azure] missing instance type validation check under defaultMachinePlatform

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20364~~. The following is the description of the original issue:
—
Description of problem:

There is no instance type validation check under defaultMachinePlatform.
For example, set platform.azure.defaultMachinePlatform.type to Standard_D11_v2, which does not support PremiumIO, then create manifests:
 
# az vm list-skus --location southcentralus --size Standard_D11_v2 --query "[].capabilities[?name=='PremiumIO'].value" -otsv
False

install-config.yaml:
-------------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_D11_v2
    baseDomainResourceGroupName: os4-common
    cloudName: AzurePublicCloud
    outboundType: Loadbalancer
    region: southcentralus

succeeded to create manifests:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
INFO Manifests created in: ipi/manifests and ipi/openshift 

while get expected error when setting type under compute:
$ ./openshift-install create manifests --dir ipi
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D11_v2

same situation for field vmNetworkingType under defaultMachinePlatform, instance type Standard_B4ms does not support Accelerated networking.
# az vm list-skus --location southcentralus --size Standard_B4ms --query "[].capabilities[?name=='AcceleratedNetworkingEnabled'].value" -otsv
False

install-config.yaml
----------------
platform:
  azure:
    defaultMachinePlatform:
      type: Standard_B4ms
      vmNetworkingType: "Accelerated" 

install still succeeds to create manifests file, should exit with error when type and vmNetworkingType setting under compute.
ERROR failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: compute[0].platform.azure.vmNetworkingType: Invalid value: "Accelerated": vm networking type is not supported for instance type Standard_B4ms

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

always on all supported version

Steps to Reproduce:

1. configure invalid instance type ( e.g unsupported PremiumIO) under defaultMachinePlatform in install-config.yaml
2. create manifests
3.

Actual results:

installer creates manifests successfully.

Expected results:

installer should exit with error, and have similar behavior when invalid instance type is configured under compute and controlPlane.

Additional info:

https://github.com/openshift/installer/pull/7615

Bug MGMT-14074: Acquiring live pxe rootfs fails with "could not resolve host" error when 4.13 9.2 live iso is used

View the Description View the linked PRs

Description of the problem:

When 9.2 based live iso is used in agentserviceconfig, after booting into CD, spoke console stuck at acquire live pxe rootfs with could not resolve host error.

It seems the DNS server configured in nmstate is not applied to spoke.

How reproducible:

100%

Steps to reproduce:

configure agentserviceconfig to use 4.13 9.2 live iso. (413.92.202303190222-0)

2. install SNO via ZTP

3. Monitor install CRs on hub

Actual results:

agentclusterinstall stuck at "insufficient" state
spoke console shows could not resolve host when attempt to download rootfs image (screenshot attached)

Expected results:

install succeeded

Extra info:

ACM version: latest 2.7.3 downstream snapshot
Did not encounter this specific issue if switch to 8.6 based 4.13 live iso in agentserviceconfig.
However, even though we can by pass this step, then similar issue happens after booting into HD which has 9.2 based OS - the DNS server on spoke is different than what is configured in nmstate, causing DNS resolution to fail.
- And we did not see this issue when using ACM 2.7.2 snapshot from about 3 weeks ago. We were able to install the same cluster using same networking configs with 4.13 9.2 build (8.6 live iso).

Bug OCPBUGS-14922: cluster operator monitoring is not available when deploying 4.14 spoke when console operator is disabled

View the Description View the linked PRs

Description of problem:

When deploying 4.14 spoke, agentclusterinstall is stuck at finalizing stage

clusterverions on spoke report "Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator monitoring is not available"

Please note: console operator is disabled purposely - it is needed in telco case to reduce platform resource usage

[kni@registry.kni-qe-28 ~]$ oc get clusterversions.config.openshift.io -A
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          46m     Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator monitoring is not available

[kni@registry.kni-qe-28 ~]$ oc get clusterversions.config.openshift.io -n version -o yaml 
apiVersion: v1
items:
- apiVersion: config.openshift.io/v1
  kind: ClusterVersion
  metadata:
    creationTimestamp: "2023-06-13T15:16:32Z"
    generation: 2
    name: version
    resourceVersion: "20061"
    uid: f8fc0c3e-009d-4d86-a05d-2fd0aba59528
  spec:
    capabilities:
      additionalEnabledCapabilities:
      - marketplace
      - NodeTuning
      baselineCapabilitySet: None
    channel: stable-4.14
    clusterID: 5cfc0491-5a23-4383-935b-71e3c793e875
  status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - NodeTuning
      - marketplace
      knownCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - NodeTuning
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
    conditions:
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: 'Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.14&id=5cfc0491-5a23-4383-935b-71e3c793e875&version=4.14.0-0.ci-2023-06-13-083232":
        dial tcp 54.211.39.83:443: connect: network is unreachable'
      reason: RemoteFailed
      status: "False"
      type: RetrievedUpdates
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: Capabilities match configured spec
      reason: AsExpected
      status: "False"
      type: ImplicitlyEnabledCapabilities
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: Payload loaded version="4.14.0-0.ci-2023-06-13-083232" image="registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80"
        architecture="amd64"
      reason: PayloadLoaded
      status: "True"
      type: ReleaseAccepted
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      status: "False"
      type: Available
    - lastTransitionTime: "2023-06-13T15:41:36Z"
      message: Cluster operator monitoring is not available
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Failing
    - lastTransitionTime: "2023-06-13T15:16:33Z"
      message: 'Unable to apply 4.14.0-0.ci-2023-06-13-083232: the cluster operator
        monitoring is not available'
      reason: ClusterOperatorNotAvailable
      status: "True"
      type: Progressing
    desired:
      image: registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80
      version: 4.14.0-0.ci-2023-06-13-083232
    history:
    - completionTime: null
      image: registry.kni-qe-28.ptp.lab.eng.bos.redhat.com:5000/openshift-release-dev/ocp-release@sha256:826bb878c5a1469ee8bb991beebc38a4e25b8f5cef9cdf1931ef99ffe5ffbc80
      startedTime: "2023-06-13T15:16:33Z"
      state: Partial
      verified: false
      version: 4.14.0-0.ci-2023-06-13-083232
    observedGeneration: 2
    versionHash: H6tRc6p_ZWU=
kind: List
metadata:
  resourceVersion: ""

[kni@registry.kni-qe-28 ~]$ oc get co -A
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
cloud-controller-manager                   4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
cloud-credential                           4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
cluster-autoscaler                         4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
config-operator                            4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
control-plane-machine-set                  4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
dns                                        4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
etcd                                       4.14.0-0.ci-2023-06-13-083232   True        False         False      22m     
image-registry                             4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
ingress                                    4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
kube-apiserver                             4.14.0-0.ci-2023-06-13-083232   True        False         False      18m     
kube-controller-manager                    4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
kube-scheduler                             4.14.0-0.ci-2023-06-13-083232   True        False         False      17m     
kube-storage-version-migrator              4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
machine-api                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
machine-approver                           4.14.0-0.ci-2023-06-13-083232   True        False         False      24m     
machine-config                             4.14.0-0.ci-2023-06-13-083232   True        False         False      21m     
marketplace                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
monitoring                                                                 False       True          True       14m     reconciling Console Plugin failed: creating ConsolePlugin object failed: the server could not find the requested resource (post consoleplugins.console.openshift.io)
network                                    4.14.0-0.ci-2023-06-13-083232   True        False         False      26m     
node-tuning                                4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
openshift-apiserver                        4.14.0-0.ci-2023-06-13-083232   True        False         False      14m     
openshift-controller-manager               4.14.0-0.ci-2023-06-13-083232   True        False         False      18m     
operator-lifecycle-manager                 4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-06-13-083232   True        False         False      25m     
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-06-13-083232   True        False         False      19m     
service-ca                                 4.14.0-0.ci-2023-06-13-083232   True        False         False      25m

Version-Release number of selected component (if applicable):
4.14

How reproducible:

100%

Steps to Reproduce:

1. Deploy RAN DU spoke cluster via gitops ZTP approach with multiple base capabilities disabled including Console operator.
   spec:     
     capabilities:       
       additionalEnabledCapabilities:
         - marketplace       
         - NodeTuning       
     baselineCapabilitySet: None     
     channel: stable-4.14 
2. Monitor ocp deployment on spoke.

Actual results:

Deployment fails while finalizing agentclusterinstall.  clusterverions on spoke report "the cluster operator monitoring is not available"

Expected results:

Successful spoke deployment

Additional info:

After manually enabling console in clusterversion, the monitoring operator succeeded and OCP install completed

must-gather logs:
https://drive.google.com/file/d/19zO21jqcVTIkAdGS2DEqQuhg2oGUmuNY/view?usp=sharing
https://drive.google.com/file/d/1PXjZmBdMwHWNwkaXr2wE9tTtBRJWYeKP/view?usp=sharing

https://github.com/openshift/cluster-monitoring-operator/pull/2011

Bug OCPBUGS-6407: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/195

Bug OCPBUGS-8938: alert TargetDown fired for XXX seconds with labels: {job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}

View the Description View the linked PRs

From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376:

```
alert TargetDown fired for 13 seconds with labels:

{job="machine-config-daemon", namespace="openshift-machine-config-operator", service="machine-config-daemon", severity="warning"}

```

Checking kubelet logs for all the nodes:
```
Aug 07 10:11:49.788245 libvirt-ppc64le-1-1-9-kfv8v-master-0 crio[1244]: time="2021-08-07 10:11:49.788169211Z" level=info msg="Started container dd7e2473c51870c1894531af9a3935b907340a31216f85c32e391bddf22d7fd0: openshift-machine-config-operator/machine-config-daemon-7r2bb/machine-config-daemon" id=15456b41-39c9-41ce-8f10-71398df6dd26 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:11:49.265439 libvirt-ppc64le-1-1-9-kfv8v-master-1 crio[1242]: time="2021-08-07 10:11:49.264443242Z" level=info msg="Created container 0651d7904d63a3f2c1fa9177d2ccf890c8fc769e96c836074aa8cc28a8bd7e04: openshift-machine-config-operator/machine-config-daemon-pk29l/machine-config-daemon" id=a622e284-7d45-4b72-b271-c39081c2c77a name=/runtime.v1alpha2.RuntimeService/CreateContainer
Aug 07 10:11:49.602420 libvirt-ppc64le-1-1-9-kfv8v-master-2 crio[1243]: time="2021-08-07 10:11:49.602359290Z" level=info msg="Started container 5a24f464210595cd394aacd4e98903a196d67762a53d764bd6f4a6010cc17acf: openshift-machine-config-operator/machine-config-daemon-69fw6/machine-config-daemon" id=89b0650c-741e-4c61-ab49-f68aa82cb302 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:15:54.666525 libvirt-ppc64le-1-1-9-kfv8v-worker-0-gddxw crio[1252]: time="2021-08-07 10:15:54.666233168Z" level=info msg="Started container 8ba32989af629e00c35578c51e9b5612ca8ddcf97b32f2b500d777a6eb2ff2e1: openshift-machine-config-operator/machine-config-daemon-5tb88/machine-config-daemon" id=4fa0e2ba-54aa-41a8-ab7b-7a3b6f6a9998 name=/runtime.v1alpha2.RuntimeService/StartContainer
Aug 07 10:16:14.170188 libvirt-ppc64le-1-1-9-kfv8v-worker-0-p76x7 crio[1235]: time="2021-08-07 10:16:14.170137303Z" level=info msg="Started container 78d933af1e7100050332b1df62e67d1fc71ca735c7a7d3c060411f61f32a0c74: openshift-machine-config-operator/machine-config-daemon-k6l8w/machine-config-daemon" id=c344fd94-abeb-4393-87f3-5bcaba21d45f name=/runtime.v1alpha2.RuntimeService/StartContainer
```

All containers started before the test started (before 2021-08-07T10:28:00Z, see https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376/build-log.txt). Checking https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1423947091704549376/artifacts/ocp-jenkins-e2e-remote-libvirt-ppc64le/gather-libvirt/artifacts/pods.json:

```
machine-config-daemon-5tb88_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-worker-0-gddxw, 0 restarts, ready since 2021-08-07T10:16:07Z
machine-config-daemon-k6l8w_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-worker-0-p76x7, 0 restarts, ready since 2021-08-07T10:16:14Z
machine-config-daemon-69fw6_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-2, 0 restarts, ready since 2021-08-07T10:11:49Z
machine-config-daemon-pk29l_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-1, 0 restarts, ready since 2021-08-07T10:11:49Z
machine-config-daemon-7r2bb_machine-config-daemon.log: assigned to libvirt-ppc64le-1-1-9-kfv8v-master-0, 0 restarts, ready since 2021-08-07T10:11:49Z
```

All containers were running since they got created and never restarted.

The incident (alert TargetDown fired for 13 seconds) occurred at August 7, 2021 10:33:18 AM. The test suite finished 2021-08-07T10:33:40Z.

Based on the TargetDown definition (see https://github.com/openshift/cluster-monitoring-operator/blob/001eccd81ff51af0ed7a9d463dd35bfa9b75d102/assets/cluster-monitoring-operator/prometheus-rule.yaml#L16-L28):
```

alert: TargetDown
annotations:
description: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.service
}} targets in {{ $labels.namespace }} namespace have been unreachable for
more than 15 minutes. This may be a symptom of network connectivity issues,
down nodes, or failures within these components. Assess the health of the
infrastructure and nodes running these targets and then contact support.'
summary: Some targets were not reachable from the monitoring server for an
extended period of time.
expr: |
100 * (count(up == 0 unless on (node) max by (node) (kube_node_spec_unschedulable == 1)) BY (job, namespace, service) /
count(up unless on (node) max by (node) (kube_node_spec_unschedulable == 1)) BY (job, namespace, service)) > 10
for: 15m
```

The machine-config-daemon was down for 15m and 13s. Given the test suite ran for ~5m42s (10:33:18-10:28:00), the target was down before the test suite started to run.

This patterns repears in other jobs as well:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424128286878863360 (alert TargetDown fired for 13 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 5m40s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424309483785424896 (alert TargetDown fired for 43 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 6m20s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-jenkins-e2e-remote-libvirt-ppc64le/1424490678854881280 (alert TargetDown fired for 30 seconds with labels, Step ocp-jenkins-e2e-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 5m20s)
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.9-ocp-image-ecosystem-remote-libvirt-ppc64le/1424294383288586240 (alert TargetDown fired for 60 seconds with labels, Step ocp-image-ecosystem-remote-libvirt-ppc64le-openshift-e2e-libvirt-test failed after 10m0s.)

For other jobs see:
https://search.ci.openshift.org/?search=alert+TargetDown+fired+for+.*+seconds+with+labels%3A+%5C%7Bjob%3D%22machine-config-daemon%22%2C+namespace%3D%22openshift-machine-config-operator%22%2C+service%3D%22machine-config-daemon%22%2C+severity%3D%22warning%22%5C%7D&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/machine-config-operator/pull/3663

Bug OCPBUGS-12343: Update 4.14 cluster-monitoring-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-monitoring-operator/pull/1952

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-monitoring-operator/pull/1952

Bug OCPBUGS-8682: Empty clickable item in drop-down list in OpenShift console -> Installed Operators -> All Instances

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Go to console 
2. Click  on "Installed Operator"
3. Add operator (Node feature discovery )
4. Click on all instances that on Create new (see image)

Actual results:

The drop down is empty but the as a user you can click them and get to the new instance yaml

Expected results:

For a better user experince if at least there will be at least some labels or clickable text

Additional info:

https://github.com/openshift/console/pull/12819

Bug OCPBUGS-14674: MCCPoolAlert is not removed when the problem that caused the alert is fixed

View the Description View the linked PRs

Description of problem:

When a MCCPoolAlert is fired and we fix the problem that caused this alert, the alert is not removed.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-06-06-212044   True        False         114m    Cluster version is 4.14.0-0.nightly-2023-06-06-212044

How reproducible:

Always

Steps to Reproduce:

1. Create a custom MCP

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: infra
spec:
  machineConfigSelector:
    matchExpressions:
      - {key: machineconfiguration.openshift.io/role, operator: In, values: [master,infra]}
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/infra: ""


2. Label a master node so that it is included in the new custom MCP

$ oc label node $(oc get nodes -l node-role.kubernetes.io/master -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/infra=""

3. Verify that the alert is fired

alias thanosalerts='curl -s -k -H "Authorization: Bearer $(oc -n openshift-monitoring create token prometheus-k8s)" https://$(oc get route -n openshift-monitoring thanos-querier -o jsonpath={.spec.host})/api/v1/alerts | jq '

$ thanosalerts |grep alertname
  ....
          "alertname": "MCCPoolAlert",


4. Remove the label from the node to fix the problem

$ oc label node $(oc get nodes -l node-role.kubernetes.io/master -ojsonpath="{.items[0].metadata.name}") node-role.kubernetes.io/infra-

Actual results:

The alert is not removed.

When we have a look at the mcc_pool_alert  metric we find 2 values with 2 different "alert" fields.

alias thanosquery='function __lgb() { unset -f __lgb; oc rsh -n openshift-monitoring prometheus-k8s-0 curl -s -k  -H "Authorization: Bearer $(oc -n openshift-monitoring create token prometheus-k8s)" --data-urlencode "query=$1" https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query | jq -c | jq; }; __lgb'

$ thanosquery mcc_pool_alert
{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "mcc_pool_alert",
          "alert": "Applying custom label for pool",
          "container": "oauth-proxy",
          "endpoint": "metrics",
          "instance": "10.130.0.86:9001",
          "job": "machine-config-controller",
          "namespace": "openshift-machine-config-operator",
          "node": "ip-10-0-129-20.us-east-2.compute.internal",
          "pod": "machine-config-controller-76dbddff49-75ggr",
          "pool": "infra",
          "prometheus": "openshift-monitoring/k8s",
          "service": "machine-config-controller"
        },
        "value": [
          1686137977.158,
          "0"
        ]
      },
      {
        "metric": {
          "__name__": "mcc_pool_alert",
          "alert": "Given both master and custom pools. Defaulting to master: custom infra",
          "container": "oauth-proxy",
          "endpoint": "metrics",
          "instance": "10.130.0.86:9001",
          "job": "machine-config-controller",
          "namespace": "openshift-machine-config-operator",
          "node": "ip-10-0-129-20.us-east-2.compute.internal",
          "pod": "machine-config-controller-76dbddff49-75ggr",
          "pool": "infra",
          "prometheus": "openshift-monitoring/k8s",
          "service": "machine-config-controller"
        },
        "value": [
          1686137977.158,
          "1"
        ]
      }
    ]
  }
}

Expected results:

The alert should be removed.

Additional info:

If we remove the MCO controller pod, a new mcc_pool_alert data is generated with the right value and the other values are removed. If we execute this workaround the alert is removed.

https://github.com/openshift/machine-config-operator/pull/3733

Bug OCPBUGS-23392: [4.14] Wrong IP for deploying IPv6 BMCs

View the Description View the linked PRs

The final iteration (of 3) of the fix for ~~OCPBUGS-4248~~ - https://github.com/openshift/cluster-baremetal-operator/pull/341 - uses the (IPv6) API VIP as the IP address for IPv6 BMCs to contact Apache to download the image to mount via virtualmedia.

Since Apache runs as part of the metal3 Deployment, it exists on only one node. There is no guarantee that the API VIP will land (or stay) on the same node, so this fails to work more often than not. Kube-proxy does not do anything to redirect traffic to pods with host networking enabled, such as the metal3 Deployment.

The IPv6 is passed to the baremetal-operator. This has been split into its own Deployment since the first iteration of ~~OCPBUGS-4228~~, in which we collected the IP address of the host from the deployed metal3 Pod. At the time that caused a circular dependency of the Deployment on its own Pod, but this would no longer be the case. However, a backport beyond 4.14 would require the Deployment split to also be backported.

Alternatively, ironic-proxy could be adapted to also proxy the images produced by ironic. This would be new functionality that would also need to be backported.

Finally, we could determine the host IP from inside the baremetal-operator container instead of from cluster-baremetal-operator. However, this approach has not been tried and would only work in backports because it relies on baremetal-operator continuing to run within same Pod as ironic.

https://github.com/openshift/cluster-baremetal-operator/pull/384

Bug OCPBUGS-24460: oc process command fails while running it with a template file

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-24375~~. The following is the description of the original issue:
—
Description of problem:

oc process command fails while running it with a template file

Version-Release number of selected component (if applicable):

4.12.41

How reproducible:

100%

Steps to Reproduce:

1. Create a new project and a template file 
$ oc new-project test
$ oc get template httpd-example -n openshift -o yaml > /tmp/template_http.yaml 

2. Run oc process command as given below
$ oc process -f /tmp/template_http.yaml 
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

3. When we run this command as a template from other namespace it runs fine.
$ oc process openshift//httpd-example

4. $ oc version
Client Version: 4.12.41
Kustomize Version: v4.5.7
Server Version: 4.12.42
Kubernetes Version: v1.25.14+bcb9a60

Actual results:

$ oc process -f /tmp/template_http.yaml
error: unable to process template: the namespace of the provided object does not match the namespace sent on the request

Expected results:

Command should display the output of resources it will create

Additional info:

https://github.com/openshift/oc/pull/1616

Bug OCPBUGS-10108: Update 4.14 openshift-enterprise-console-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/console-operator/pull/737

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/console-operator/pull/738

Bug OCPBUGS-11992: aws-proxy jobs are failing with machine-config-operator errors

View the Description View the linked PRs

Description of problem:

aws-proxy jobs are failing with workers unable to come up. Example job run[1].  On the console, the workers report 500 errors trying to retrieve the worker ignition[2]. 

Is it possible https://github.com/openshift/machine-config-operator/pull/3662 broke things? See logs below.


[1] https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-proxy/1648560213655031808
[2] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-aws-ovn-proxy/1648560213655031808/artifacts/e2e-aws-ovn-proxy/gather-aws-console/artifacts/i-071b5af3ddb12e55c

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.  Install with a proxy

Actual results:

No workers come up

Expected results:

Additional info:

Logs are reporting: 

2023-04-19T12:29:38.244051716Z I0419 12:29:38.244006 1 container_runtime_config_controller.go:415] Error syncing image config openshift-config: could not get ControllerConfig controllerconfig.machineconfiguration.openshift .io "machine-config-controller" not found 2023-04-19T12:29:56.507515526Z I0419 12:29:56.507472 1 render_controller.go:377] Error syncing machineconfigpool worker: controllerconfig.machineconfiguration.openshift.io "machine-config-controller" not found

./pods/machine-config-operator-6d7c6c8ccf-m7c57/machine-config-operator/machine-config-operator/logs/current.log:2023-04-19T12:38:15.240508503Z E0419 12:38:15.240437 1 operator.go:342] ControllerConfig.machineconfiguration.openshift.io "machine-config-controller" is invalid: [spec.proxy.apiVersion: Required value: must not be empty, spec.proxy.kind: Required value: must not be empty, <nil>: Invalid value: "null": some validation rules were not checked because the object was invalid; correct the existing errors to complete validation]

https://github.com/openshift/machine-config-operator/pull/3682

Bug OCPBUGS-16783: Chore: Update OWNERS and OWNERS_ALIASES in CSI driver and operator repos

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES in all CSI driver and operator repos.

For driver repos:

1) OWNERS must have `component`:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

For operator repos:

1) OWNERS must have:

all team members of Storage team as `approvers`
`component`:
```
component: "Storage / Operators"
```

Bug OCPBUGS-23490: When build capability is disabled, ConfigObserver controller does not run

View the Description View the linked PRs

Description of problem:

ConfigObserver controller waits until the all given informers are marked as synced including the build informer. However, when build capability is disabled, that causes ConfigObserver's blockage and never runs.

This is likely only happening on 4.15 because capability watching mechanism was bound to ConfigObserver in 4.15.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Launch cluster-bot cluster via "launch 4.15.0-0.nightly-2023-11-05-192858,openshift/cluster-openshift-controller-manager-operator#315 no-capabilities"

Steps to Reproduce:

1.
2.
3.

Actual results:

ConfigObserver controller stuck in failure

Expected results:

ConfigObserver controller runs and successfully clear all deployer service accounts when deploymentconfig capability is disabled.

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/318

Bug OCPBUGS-10714: [GWAPI] OSSM 2.4 spec.techPreview.controlPlaneMode field not supported anymore

View the Description View the linked PRs

Description of problem:

OSSM Daily builds were updated to no longer support the spec.techPreview.controlPlaneMode field and OSSM will not create a SMCP as a result. The field needs to be updated to spec.mode.

Gateway API enhanced dev preview is currently broken (currently using latest 2.4 daily build because 2.4 is unreleased). This should be resolved before OSSM 2.4 is GA.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

1. Follow instructions in http://pastebin.test.redhat.com/1092754

Actual results:

CIO fails to create a SMCP

"error": "failed to create ServiceMeshControlPlane openshift-ingress/openshift-gateway: admission webhook \"smcp.validation.maistra.io\" denied the request: the spec.techPreview.controlPlaneMode field is not supported in version 2.4+; use spec.mode"

Expected results:

CIO is able to create a SMCP

Additional info:

https://github.com/openshift/cluster-ingress-operator/pull/901

Bug OCPBUGS-12897: Knative Route Details Page should show the URL of the route as it is shown in the Openshift Routes Details page

View the Description View the linked PRs

Description of problem:

Currently the Knative Routes Details page doesnot show the URL of the Route.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install Knative Serving (Serverless Operator)
2. Create a SF from the Add Page.
3. Navigate to the Knative Routes Details page

Actual results:

No URL is shown

Expected results:

URL should be shown

Additional info:

Images: https://drive.google.com/drive/folders/13Ya0mFhDrgFIrVcq6DaLyOxZbatz82Al?usp=share_link

https://github.com/openshift/console/pull/12853

Bug OCPBUGS-13205: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/214

Bug OCPBUGS-17806: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7421

Bug OCPBUGS-22771: Masters are not attached with the provided custom security groups which defined in platform.aws.defaultMachinePlatform

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20525~~. The following is the description of the original issue:
—
Description of problem:

Set custom security group IDs in the installconfig.platform.aws.defaultMachinePlatform.additionalSecurityGroupIDs field of install-config.yaml

such as: 

   apiVersion: v1
   controlPlane:
     architecture: amd64
     hyperthreading: Enabled
     name: master
     platform: {}
     replicas: 3
   compute:
   - architecture: amd64
     hyperthreading: Enabled
     name: worker
     platform: {}
     replicas: 3
   metadata:
     name: gpei-test1013
   platform:
     aws:
       region: us-east-2
       subnets:
       - subnet-0bc86b64e7736479c
       - subnet-0addd33c410b52251
       - subnet-093392f94a4099566
       - subnet-0b915a53042b6dc61
       defaultMachinePlatform:
         additionalSecurityGroupIDs:
         - sg-0fbc4c9733e6c18e7
         - sg-0b46b502b575d30ba
         - sg-02a59f8662d10c6d3


After installation, check the Security Groups attached to master and worker, master doesn't have the specified custom security groups attached while workers have. 

For one of the masters:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-08c0b0b6e4308be3b  --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "terraform-20231013000602175000000002",
                "GroupId": "sg-04b104d07075afe96"
            }
        ]
    ]
]

For one of the workers:
[root@preserve-gpei-worker k_files]# aws ec2 describe-instances --instance-ids i-00643f07748ec75da --query 'Reservations[*].Instances[*].SecurityGroups[*]' --output json
[
    [
        [
            {
                "GroupName": "test-sg2",
                "GroupId": "sg-0b46b502b575d30ba"
            },
            {
                "GroupName": "terraform-20231013000602174300000001",
                "GroupId": "sg-0d7cd50d4cb42e513"
            },
            {
                "GroupName": "test-sg3",
                "GroupId": "sg-02a59f8662d10c6d3"
            },
            {
                "GroupName": "test-sg1",
                "GroupId": "sg-0fbc4c9733e6c18e7"
            }
        ]
    ]
]


Also checked the master's controlplanemachineset, it does have the custom security groups configured, but they're not attached to the master instance in the end.

[root@preserve-gpei-worker k_files]# oc get controlplanemachineset -n openshift-machine-api cluster -o yaml |yq .spec.template.machines_v1beta1_machine_openshift_io.spec.providerSpec.value.securityGroups
- filters:
    - name: tag:Name
      values:
        - gpei-test1013-8lwtb-master-sg
- id: sg-02a59f8662d10c6d3
- id: sg-0b46b502b575d30ba
- id: sg-0fbc4c9733e6c18e7

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-12-104602

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

It works well when setting the security groups in installconfig.controlPlane.platform.aws.additionalSecurityGroupIDs

https://github.com/openshift/installer/pull/7658

Bug OCPBUGS-6767: Regression: OpenShift Console no-longer filters SecretList when displaying ServiceAccount

View the Description View the linked PRs

Description of problem:

OpenShift Console does not filter the SecretList when displaying the ServiceAccount details page

When reviewing the details page of an OpenShift ServiceAccount, at the bottom of the page there is a SecretsList which is intended to display all of the relevant Secrets that are attached to the ServiceAccount.

In OpenShift 4.8.X, this SecretList only displayed the relevant Secrets. In OpenShift 4.9+ the SecretList now displays all Secrets within the entire Namespace.

Version-Release number of selected component (if applicable):

4.8.57 < Most recent release without issue
4.9.0 < First release with issue 
4.10.46 < Issue is still present

How reproducible:

Everytime

Steps to Reproduce:

1. Deploy a cluster with OpenShift 4.8.57 
      (or replace the OpenShift Console image with `sha256:9dd115a91a4261311c44489011decda81584e1d32982533bf69acf3f53e17540` )
2. Access the ServiceAccounts Page ( User Management -> ServiceAccounts)
3. Click a ServiceAccount to display the Details page
4. Scroll down and review the Secrets section
5. Repeat steps with an OpenShift 4.9 release 
   (or check using image `sha256:fc07081f337a51f1ab957205e096f68e1ceb6a5b57536ea6fc7fbcea0aaaece0` )

Actual results:

All Secrets in the Namespace are displayed

Expected results:

Only Secrets associated with the ServiceAccount are displayed

Additional info:

Lightly reviewing the code, the following links might be a good start:
- https://github.com/openshift/console/blob/master/frontend/public/components/secret.jsx#L126
- https://github.com/openshift/console/blob/master/frontend/public/components/service-account.jsx#L151:L151

https://github.com/openshift/console/pull/12679

Bug OCPBUGS-7954: [BZ] The script for certs check fails with Openstack client version 3.18.1

View the Description View the linked PRs

Description of problem:

The script for checking the certs for Openshift install on openstack fails. 

https://docs.openshift.com/container-platform/4.12/installing/installing_openstack/preparing-to-install-on-openstack.html#security-osp-validating-certificates_preparing-to-install-on-openstack

I see that the command "openstack catalog list --format json --column Name --column Endpoints" returns output as,

-----------
[
  {
    "Name": "heat-cfn",
    "Endpoints": "RegionOne\n  admin: http://10.254.x.x:8000/v1\nRegionOne\n  public: https://<domain_name>:8000/v1\nRegionOne\n  internal: http://10.254.x.x:8000/v1\n"
  },
  {
    "Name": "cinderv2",
    "Endpoints": "RegionOne\n  admin: http://10.254.x.x:8776/v2/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  internal: http://10.254.x.x:8776/v2/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  public: https://<domain_name>:8776/v2/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "glance",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:9292\nRegionOne\n  admin: http://10.254.x.x:9292\nRegionOne\n  internal: http://10.254.x.x:9292\n"
  },
  {
    "Name": "keystone",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:5000\nRegionOne\n  admin: http://10.254.x.x:35357\nRegionOne\n  public: https://<domain_name>:5000\n"
  },
  {
    "Name": "swift",
    "Endpoints": "RegionOne\n  admin: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\nRegionOne\n  public: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\nRegionOne\n  internal: https://ch-dc-s3-gsn-33.eecloud.nsn-net.net:10032/swift/v1\n"
  },
  {
    "Name": "nova",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:8774/v2.1\nRegionOne\n  internal: http://10.254.x.x:8774/v2.1\nRegionOne\n  admin: http://10.254.x.x:8774/v2.1\n"
  },
  {
    "Name": "heat",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:8004/v1/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  public: https://<domain_name>:8004/v1/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  admin: http://10.254.x.x:8004/v1/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "cinder",
    "Endpoints": ""
  },
  {
    "Name": "cinderv3",
    "Endpoints": "RegionOne\n  public: https://<domain_name>:8776/v3/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  admin: http://10.254.x.x:8776/v3/f36f2db6bb434484b71a45aa84b9d790\nRegionOne\n  internal: http://10.254.x.x:8776/v3/f36f2db6bb434484b71a45aa84b9d790\n"
  },
  {
    "Name": "neutron",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:9696\nRegionOne\n  public: https://<domain_name>:9696\nRegionOne\n  admin: http://10.254.x.x:9696\n"
  },
  {
    "Name": "placement",
    "Endpoints": "RegionOne\n  internal: http://10.254.x.x:8778\nRegionOne\n  admin: http://10.254.x.x:8778\nRegionOne\n  public: https://<domain_name>:8778\n"
  }
]
-----------

Which then expected to be filtered with jq as " | jq -r '.[] | .Name as $name | .Endpoints[] | [$name, .interface, .url] | join(" ")'| sort " 


But it fails with error as,

----------------
./certs.sh
jq: error (at <stdin>:46): Cannot iterate over string ("RegionOne\...)

Further check the script following commands execution is  failing
 openstack catalog list --format json --column Name --column Endpoints \
> | jq -r '.[] | .Name as $name | .Endpoints[] | [$name, .interface, .url] | join(" ")'
jq: error (at <stdin>:46): Cannot iterate over string ("RegionOne\...)
----------------

Where certs.sh is the script we copied from documentation.

I did some debugs to get the things .interface,.url to internal,public,admin fields from endpoint but I'm not sure if that's way it is on openstack so marking this as BZ to have reviewed.

Version-Release number of selected component (if applicable):

Openshift Container Platform 4.12 on 3.18.1 release of openstack

How reproducible:

- Always

Steps to Reproduce:

1. Copy the script and run it on given release of openstack version. 2.
3.

Actual results:

Fails with parsing

Expected results:

Shouldn't fail.

Additional info:

Bug OCPBUGS-129: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Once a user makes a change to the log component from master node's log section, then the user is unable to change or select a different log component from the dropdown.

To make different log component selection , the user needs to revisit the logs section under master node again and this refreshes the pane and reloads to default options.

Version-Release number of selected components (if applicable):

4.11.0-0.nightly-2022-08-15-152346

How reproducible:

Always

Steps to Reproduce:

Login to OCP web console.
Go to Compute > Nodes > Click on one of the master nodes.
Go to the Logs section.
Change the dropdown value from journal to openshift-apiserver ( also select audit log)
Try to change the dropdown value from openshift-apiserver to journal/kube-apiserver/oauth-apiserver.
View the behavior.

Actual results:

Unable to select or change the log component once the user already made a selection from the dropdown under master nodes' logs section.

Expected results:

Users should be allowed to change/select the log component from master node's logs section whenever required with the help of available dropdown.

Additional info:

Reproduced in both chrome[103.0.5060.114 (Official Build) (64-bit)] and firefox[91.11.0esr (64-bit)] browsers
Attached screen capture for the same.ScreenRecorder_2022-08-16_26457662-aea5-4a00-aeb4-0fbddf8f16f0.mp4

https://github.com/openshift/console/pull/13052

Bug OCPBUGS-17372: Unable to deploy 4.12 spoke clusters(using 4.12 live iso) from 4.14.0-ec.4 hub, bmh stuck in provisioning state due to Failed to update hostname: Command '['chroot', '/mnt/coreos', 'hostnamectl', 'hostname']' returned non-zero exit status 1

View the Description View the linked PRs

Description of problem:

When deploy 4.12 spoke clusters(using rhcos-412.86.202306132230-0-live.x86_64.iso) or 4.10 spoke clusters from a 4.14.0-ec.4 hub, bmh gets stuck in provisioning state due to Failed to update hostname: Command '['chroot', '/mnt/coreos', 'hostnamectl', 'hostname']' returned non-zero exit status 1. Running `hostnamectl hostname` returns `Unknown operation hostname`. It looks like older versions of hostnamectl do not support the hostname option.

Version-Release number of selected component (if applicable):

4.14.0-ec.4

How reproducible:

100%

Steps to Reproduce:

1. From a 4.14.0-ec.4 hub cluster deploy a 4.12 spoke cluster using rhcos-412.86.202306132230-0-live.x86_64.iso via ZTP procedure

Actual results:

BMH stuck in provisioning state

Expected results:

BMH gets provisioned

Additional info:

I also tried using a 4.14 iso image to deploy the 4.12 payload but then kubelet would fail with err="failed to parse kubelet flag: unknown flag: --container-runtime"

https://github.com/openshift/ironic-agent-image/pull/86

Bug OCPBUGS-19074: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/181

Bug OCPBUGS-24633: The traffic between worker node and external host got broken after delete ipsec-host pods

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.

2.

3.

Actual results:

Expected results:

Additional info:

Please fill in the following template while reporting a bug and provide as much relevant information as possible. Doing so will give us the best chance to find a prompt resolution.

Affected Platforms:

Is it an

internal CI failure
customer issue / SD
internal RedHat testing failure

If it is an internal RedHat testing failure:

Please share a kubeconfig or creds to a live cluster for the assignee to debug/troubleshoot along with reproducer steps (specially if it's a telco use case like ICNI, secondary bridges or BM+kubevirt).

If it is a CI failure:

Did it happen in different CI lanes? If so please provide links to multiple failures with the same error instance
Did it happen in both sdn and ovn jobs? If so please provide links to multiple failures with the same error instance
Did it happen in other platforms (e.g. aws, azure, gcp, baremetal etc) ? If so please provide links to multiple failures with the same error instance
When did the failure start happening? Please provide the UTC timestamp of the networking outage window from a sample failure run
If it's a connectivity issue,
What is the srcNode, srcIP and srcNamespace and srcPodName?
What is the dstNode, dstIP and dstNamespace and dstPodName?
What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)

If it is a customer / SD issue:

Provide enough information in the bug description that Engineering doesn’t need to read the entire case history.
Don’t presume that Engineering has access to Salesforce.
Please provide must-gather and sos-report with an exact link to the comment in the support case with the attachment. The format should be: https://access.redhat.com/support/cases/#/case/<case number>/discussion?attachmentId=<attachment id>
Describe what each attachment is intended to demonstrate (failed pods, log errors, OVS issues, etc).
Referring to the attached must-gather, sosreport or other attachment, please provide the following details:
- If the issue is in a customer namespace then provide a namespace inspect.
- If it is a connectivity issue:
  - What is the srcNode, srcNamespace, srcPodName and srcPodIP?
  - What is the dstNode, dstNamespace, dstPodName and dstPodIP?
  - What is the traffic path? (examples: pod2pod? pod2external?, pod2svc? pod2Node? etc)
  - Please provide the UTC timestamp networking outage window from must-gather
  - Please provide tcpdump pcaps taken during the outage filtered based on the above provided src/dst IPs
- If it is not a connectivity issue:
  - Describe the steps taken so far to analyze the logs from networking components (cluster-network-operator, OVNK, SDN, openvswitch, ovs-configure etc) and the actual component where the issue was seen based on the attached must-gather. Please attach snippets of relevant logs around the window when problem has happened if any.

For OCPBUGS in which the issue has been identified, label with “sbr-triaged”
For OCPBUGS in which the issue has not been identified and needs Engineering help for root cause, labels with “sbr-untriaged”
Note: bugs that do not meet these minimum standards will be closed with label “SDN-Jira-template”

https://github.com/openshift/cluster-network-operator/pull/2152

Bug OCPBUGS-12579: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-ingress-operator/pull/915

Bug OCPBUGS-12904: Add missing dependencies to openstack-installer CI image

View the Description View the linked PRs

Description of problem:

In order to test proxy installations, the CI base image for OpenShift on OpenStack needs netcat.

https://github.com/openshift/installer/pull/7142

Bug OCPBUGS-13547: Azure CCM should be promoted to GA

View the Description View the linked PRs

Description of problem:

Azure CCM should be GA before the end of 4.14. When we previously tried to promote it there were issues, so we need to improve the feature gates promotion so that we can promote all components in a single release.
And then promote the CCM to GA once those changes are in place.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-14906: update dependencies for ironic-agent-image for OCP 4.14

View the Description View the linked PRs

dependencies for the ironic containers are quite old, we need to upgrade them to the latest available to keep up with upstream requirements

https://github.com/openshift/ironic-agent-image/pull/77

Bug OCPBUGS-18457: Agent installer integration tests failure

View the Description View the linked PRs

Description of problem:

The agent installer integration test fails because of the change in the base iso's kargs.json and uses fedora-coreos instead of rhcos. As the integration test uses strict checks using `cmp` function, the test fails because of absence of "coreos.liveiso=fedora-coreos-38.20230609.3.0" in the expected result of the integration test.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Get latest code from master branch
2. Run ./hack/go-integration-test.sh

Actual results:

INFO[2023-09-01T02:23:01Z] --- FAIL: TestAgentIntegration (369.83s)19    --- FAIL: TestAgentIntegration/agent_pxe_configurations (0.00s)20        --- FAIL: TestAgentIntegration/agent_pxe_configurations/sno (49.93s)21            testscript.go:520: # Verify a default configuration for the SNO topology (49.805s)22                > exec openshift-install agent create pxe-files --dir $WORK23                [stderr]24                level=warning msg=CPUPartitioning:  is ignored25                level=info msg=Configuration has 1 master replicas and 0 worker replicas26                level=info msg=The rendezvous host IP (node0 IP) is 192.168.111.2027                level=info msg=Extracting base ISO from release payload28                level=info msg=Verifying cached file29                level=info msg=Using cached Base ISO /.cache/agent/image_cache/coreos-x86_64.iso30                level=info msg=Consuming Install Config from target directory31                level=info msg=Consuming Agent Config from target directory32                level=info msg=Created iPXE script agent.x86_64.ipxe in $WORK/pxe directory33                level=info msg=PXE-files created in: $WORK/pxe34                level=info msg=Kernel parameters for PXE boot: coreos.liveiso=fedora-coreos-38.20230609.3.0 ignition.firstboot ignition.platform.id=metal35                > stderr 'Created iPXE script agent.x86_64.ipxe'36                > exists $WORK/pxe/agent.x86_64-initrd.img37                > exists $WORK/pxe/agent.x86_64-rootfs.img38                > exists $WORK/pxe/agent.x86_64-vmlinuz39                > exists $WORK/auth/kubeconfig40                > exists $WORK/auth/kubeadmin-password41                > cmp $WORK/pxe/agent.x86_64.ipxe $WORK/expected/agent.x86_64.ipxe42                diff $WORK/pxe/agent.x86_64.ipxe $WORK/expected/agent.x86_64.ipxe43                --- $WORK/pxe/agent.x86_64.ipxe44                +++ $WORK/expected/agent.x86_64.ipxe45                @@ -1,4 +1,4 @@46                 #!ipxe47                 initrd --name initrd http://user-specified-pxe-infra.com/agent.x86_64-initrd.img48                -kernel http://user-specified-pxe-infra.com/agent.x86_64-vmlinuz initrd=initrd coreos.live.rootfs_url=http://user-specified-pxe-infra.com/agent.x86_64-rootfs.img coreos.liveiso=fedora-coreos-38.20230609.3.0 ignition.firstboot ignition.platform.id=metal49                +kernel http://user-specified-pxe-infra.com/agent.x86_64-vmlinuz initrd=initrd coreos.live.rootfs_url=http://user-specified-pxe-infra.com/agent.x86_64-rootfs.img ignition.firstboot ignition.platform.id=metal50                 boot51                52                FAIL: testdata/agent/pxe/configurations/sno.txt:13: $WORK/pxe/agent.x86_64.ipxe and $WORK/expected/agent.x86_64.ipxe differ

Expected results:

Test should always pass

Additional info:

Bug OCPBUGS-9464: mtls CRL not working when using an intermediate CA

View the Description View the linked PRs

Description of problem:

mtls connection is not working when using an intermetiate CA appart from the root CA, both with CRL defined.
The Intermediate CA Cert had a published CDP which directed to a CRL issued by the root CA.

The config map in the openshift-ingress namespace contains the CRL as issued by the root CA. The CRL issued by the Intermediate CA is not present since that CDP is in the user cert and so not in the bundle.

When attempting to connect using a user certificate issued by the Intermediate CA it fails with an error of unknown CA.

When attempting to connect using a user certificate issued by the to Root CA the connection is successful.

Version-Release number of selected component (if applicable):

4.10.24

How reproducible:
Always

Steps to Reproduce:

1. Configure CA and intermediate CA with CRL
2. Sign client certificate with the intermediate CA
3. Configure mtls in openshift-ingress

Actual results:

When attempting to connect using a user certificate issued by the Intermediate CA it fails with an error of unknown CA.
When attempting to connect using a user certificate issued by the to Root CA the connection is successful.

Expected results:

Be able to connect with client certificated signed by the intermediate CA

Additional info:

Bug OCPBUGS-10824: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/origin/pull/27818

Bug OCPBUGS-11442: User configured In-cluster proxy configuration squashed in hypershift

View the Description View the linked PRs

Description of problem:

Currently: Hypershift is squashing any user configured proxy configuration based on this line: https://github.com/openshift/hypershift/blob/main/support/globalconfig/proxy.go#L21-L28, https://github.com/openshift/hypershift/blob/release-4.11/control-plane-operator/hostedclusterconfigoperator/controllers/resources/resources.go#L487-L493. Because of this any user changes to the cluster-wide proxy configuration documented here: https://docs.openshift.com/container-platform/4.12/networking/enable-cluster-wide-proxy.html are squashed and not valid for more than a few seconds. That blocks some functionality in the openshift cluster from working including application builds from the openshift samples provided in the cluster.

Version-Release number of selected component (if applicable):

4.13 4.12 4.11

How reproducible:

100%

Steps to Reproduce:

1. Make a change to the Proxy object in the cluster with kubectl edit proxy cluster
2. Save the change
3. Wait a few seconds

Actual results:

HostedClusterConfig operator will go in and squash the value

Expected results:

The value the user provides remains in the configuration and is not squashed to an empty value

Additional info:

https://github.com/openshift/hypershift/pull/2382

Bug OCPBUGS-12863: cluster-dns-operator repo Issues link directs people to Bugzilla

View the Description View the linked PRs

Description of problem:

Reported in https://github.com/openshift/cluster-ingress-operator/issues/911

When you open a new issue, it still directs you to Bugzilla, and then doesn't work.

It can be changed here: https://github.com/openshift/cluster-ingress-operator/blob/master/.github/ISSUE_TEMPLATE/config.yml
, but to what?

The correct Jira link is
https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&components=12367900&priority=10300&customfield_12316142=26752

But can the public use this mechanism? Yes - https://redhat-internal.slack.com/archives/CB90SDCAK/p1682527645965899

Version-Release number of selected component (if applicable):

n/a

How reproducible:

May be in other repos too.

Steps to Reproduce:

1. Open Issue in the repo - click on New Issue
2. Follow directions and click on link to open Bugzilla
3. Get message that this doesn't work anymore

Actual results:

You get instructions that don't work to open a bug from an Issue.

Expected results:

You get instructions to just open an Issue, or get correct instructions on how to open a bug using Jira.

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/374

Bug OCPBUGS-16019: Multiple cni-sysctl-allowlist-ds were created in openshift-multus namespace

View the Description View the linked PRs

Description of problem:

Hello, one of our customers had several cni-sysctl-allowlist-ds created (around 10.000 pods) in openshift-multus namespace. That caused several issues in the cluster, as nodes were full of pods an run out of IPs.

After deleting them, the situation has improved. But we want to know the root cause of this issue.

Searching in the network-operator pod logs, it seems that the customer faced some networking issues. After this issue, we can see that the cni-sysctl-allowlist pods started to be created.

Could we know why the cni-sysctl-allowlist-ds pods were created?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1904

Bug OCPBUGS-15282: Network Operator not setting its version and blocking upgrade completion

View the Description View the linked PRs

Description of problem:

When upgrading a 4.11.33 cluster to 4.12.21, the Cluster Version Operator is stuck waiting for the Network Operator to update:

$ omc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.43   True        True          14m     Working towards 4.12.21: 672 of 831 done (80% complete), waiting on network

CVO pod log states:

2023-06-16T12:07:22.596127142Z I0616 12:07:22.596023       1 metrics.go:490] ClusterOperator network is not setting the 'operator' version

Indeed the NO version is empty:

$ omc get co network -o json|jq '.status.versions'
null

However, it's status is available and not progressing, not degraded: 

$ omc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
network             True        False         False      19m
   
Network operator pod log states:

2023-06-16T12:08:56.542287546Z I0616 12:08:56.542271       1 connectivity_check_controller.go:138] ConnectivityCheckController is waiting for transition to desired version (4.12.21) to be completed.
2023-06-16T12:04:40.584407589Z I0616 12:04:40.584349       1 ovn_kubernetes.go:1437] OVN-Kubernetes master and node already at release version 4.12.21; no changes required

The Network Operator pod, however, has the version correctly:
$ omc get pods -n openshift-network-operator -o jsonpath='{.items[].spec.containers[0].env[?(@.name=="RELEASE_VERSION")]}'|jq
{
  "name": "RELEASE_VERSION",
  "value": "4.12.21"
}

Restarts of the related pods had no effect.  I have trace logs of the Network Operator available.  It looked like it might be related to https://github.com/openshift/cluster-network-operator/pull/1818 but that looks to be code introduced in 4.14.

Version-Release number of selected component (if applicable):

How reproducible:

I have not reproduced.

Steps to Reproduce:

1.  Cluster version began at stable 4.10.56
2.  Upgraded to 4.11.43 successfully
3.  Upgraded to 4.12.21 and is stuck.

Actual results:

CVO Stuck waiting on NO to complete, NO

Expected results:

NO to update its version so the CVO can continue.

Additional info:

Bare Metal IPI cluster with OVN Networking.

https://github.com/openshift/cluster-network-operator/pull/1851

Bug OCPBUGS-20115: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1736

Bug MGMT-14395: Day-2 hosts stuck in insufficient due to error creating DNS resolution step

View the Description View the linked PRs

Description of the problem:

Day-2 host stuck in insufficient

How reproducible:

100%

Steps to reproduce:

1. See CI job

Actual results:

Day-2 host stuck in insufficient

Expected results:

Day-2 host becomes known

https://github.com/openshift/assisted-service/pull/5139

Bug OCPBUGS-12651: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/114

Bug OCPBUGS-14123: TestBodySizeLimit test is flaky

View the Description View the linked PRs

The TestBodySizeLimit is increasingly flaky. We need to investigate and fix it.
https://search.ci.openshift.org/?search=FAIL%3A+TestBodySizeLimit&maxAge=48h&context=2&type=all&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

https://github.com/openshift/cluster-monitoring-operator/pull/1991

Bug OCPBUGS-18992: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/303

Bug OCPBUGS-19020: Unstable node internal IP causes connection errors for KubeVirt platform

View the Description View the linked PRs

Description of problem:

The HyperShift KubeVirt (openshift virtualization) platform has worker nodes that are hosted by KubeVirt virtual  machines. The worker node's internal IP address is interpreted by inspecting the kubevirt vmi's vmi.status.interface field.

Due to the way the vmi.status.interface field sources its information from the qemu guest agent, that field is not guaranteed to remain static in some scenarios, such as soft reboot or when the qemu agent is temporarily unavailable. During these situations, the interfaces list will be empty.

When the interfaces list is empty on the vmi, there are Hypershift related components (cloud-provider-kubevirt and cluster-api-provider-kubevirt) which strip the worker nodes internal IP. This stripping of the node's internal IP causes unpredictable behavior that results in connectivity failures from the KAS to the worker node kubelets.

To address this, the Hypershift related kubevirt components need to only update the Internal IP of worker nodes when the vmi.status.interfaces list has an IP for the default interface. Othewise these hypershift components should use the last known internal IP address rather than stripping the internal IP address from the node.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100% given enough time and the right environment.

Steps to Reproduce:

1. create a hypershift kubevirt guest cluster
2. run the csi conformance test suite in a loop (this test suite causes the vmi.status.interfaces list to become unstable briefly at times)

Actual results:

the csi test suite will eventually begin failing due to inabiilty to pod exec into worker node pods. This is caused by the node's internal IP being removed.

Expected results:

csi conformance should pass reliably

Additional info:

https://github.com/openshift/cloud-provider-kubevirt/pull/26

Bug OCPBUGS-7973: [IBMCloud] destroyed the private cluster, fail to cleanup the dns records

View the Description View the linked PRs

Description of problem:

After destroyed the private cluster, the cluster's dns records left.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-02-26-022418 
4.13.0-0.nightly-2023-02-26-081527

How reproducible:

always

Steps to Reproduce:

1.create a private cluster
2.destroy the cluster
3.check the dns record  
$ibmcloud dns zones | grep private-ibmcloud.qe.devcluster.openshift.com (base_domain)
3c7af30d-cc2c-4abc-94e1-3bcb36e01a9b   private-ibmcloud.qe.devcluster.openshift.com     PENDING_NETWORK_ADD
$zone_id=3c7af30d-cc2c-4abc-94e1-3bcb36e01a9b
$ibmcloud dns resource-records $zone_id
CNAME:520c532f-ca61-40eb-a04e-1a2569c14a0b   api-int.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com   CNAME   60    10a7a6c7-jp-tok.lb.appdomain.cloud   
CNAME:751cf3ce-06fc-4daf-8a44-bf1a8540dc60   api.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com       CNAME   60    10a7a6c7-jp-tok.lb.appdomain.cloud   
CNAME:dea469e3-01cd-462f-85e3-0c1e6423b107   *.apps.ci-op-wkb4fgd6-eef7e.private-ibmcloud.qe.devcluster.openshift.com    CNAME   120   395ec2b3-jp-tok.lb.appdomain.cloud

Actual results:

the dns records of the cluster were left

Expected results:

created dns record by installer are all deleted, after destroyed the cluster

Additional info:

this block create private cluster later, caused the maximum limit of 5 wildcard records are easily reached. (qe account limitation)
checking the *ingress-operator.log of the failed cluster, got the error: "createOrUpdateDNSRecord: failed to create the dns record: Reached the maximum limit of 5 wildcard records."

https://github.com/openshift/installer/pull/6924

Bug OCPBUGS-19355: topologySpreadConstraints for UWM prometheus-operator does not work

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17682~~. The following is the description of the original issue:
—
Description of problem:

since in-cluster prometheus-operator and UWM prometheus-operator pods are scheduled to master nodes, see from

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator/deployment.yaml#L88-L97

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/assets/prometheus-operator-user-workload/deployment.yaml#L91-L103

enabled UWM and add topologySpreadConstraints for in-cluster prometheus-operator and UWM prometheus-operator(set topologyKey to node-role.kubernetes.io/master), topologySpreadConstraints takes effect for in-cluster prometheus-operator, but not for UWM prometheus-operator

apiVersion: v1
data:
  config.yaml: |
    enableUserWorkload: true
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring

in-cluster prometheus-operator, topologySpreadConstraints settings are loaded to prometheus-operator pod and deployment, see

$ oc -n openshift-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
        maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
      volumes:

$ oc -n openshift-monitoring get pod -l app.kubernetes.io/name=prometheus-operator -o wide
NAME                                   READY   STATUS    RESTARTS   AGE    IP            NODE                                                 NOMINATED NODE   READINESS GATES
prometheus-operator-65496d5b78-fb9nq   2/2     Running   0          105s   10.128.0.71   juzhao-0813-szb9h-master-0.c.openshift-qe.internal   <none>           <none>

$ oc -n openshift-monitoring get pod prometheus-operator-65496d5b78-fb9nq -oyaml | grep topologySpreadConstraints -A7
    topologySpreadConstraints:
    - labelSelector:
        matchLabels:
          app.kubernetes.io/name: prometheus-operator
      maxSkew: 1
      topologyKey: node-role.kubernetes.io/master
      whenUnsatisfiable: DoNotSchedule
    volumes:

but the topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
apiVersion: v1
data:
  config.yaml: |
    prometheusOperator:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: node-role.kubernetes.io/master
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: prometheus-operator
kind: ConfigMap
metadata:
  creationTimestamp: "2023-08-14T08:10:49Z"
  labels:
    app.kubernetes.io/managed-by: cluster-monitoring-operator
    app.kubernetes.io/part-of: openshift-monitoring
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
  resourceVersion: "212490"
  uid: 048f91cb-4da6-4b1b-9e1f-c769096ab88c

$ oc -n openshift-user-workload-monitoring get deploy prometheus-operator -oyaml | grep topologySpreadConstraints -A7
no result

$ oc -n openshift-user-workload-monitoring get pod -l app.kubernetes.io/name=prometheus-operator
NAME                                   READY   STATUS    RESTARTS   AGE
prometheus-operator-77bcdcbd9c-m5x8z   2/2     Running   0          15m

$ oc -n openshift-user-workload-monitoring get pod prometheus-operator-77bcdcbd9c-m5x8z -oyaml | grep topologySpreadConstraints
no result

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

topologySpreadConstraints settings are not loaded to UWM prometheus-operator pod and deployment

Expected results:

topologySpreadConstraints settings loaded to UWM prometheus-operator pod and deployment

https://github.com/openshift/cluster-monitoring-operator/pull/2087

Bug OCPBUGS-11310: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-operator/pull/266

Bug OCPBUGS-12718: Update 4.14 ose-vmware-vsphere-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/75

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/71

Bug OCPBUGS-14007: telemetry remote write test flaky

View the Description View the linked PRs

Our telemetry test using remote write is increasingly flaky. The recurring error is:

TestTelemeterRemoteWrite
    telemeter_test.go:103: timed out waiting for the condition: error validating response body "{\"status\":\"success\",\"data\":{\"resultType\":\"vector\",\"result\":[{\"metric\":{\"container\":\"kube-rbac-proxy\",\"endpoint\":\"metrics\",\"job\":\"prometheus-k8s\",\"namespace\":\"openshift-monitoring\",\"remote_name\":\"2bdd72\",\"service\":\"prometheus-k8s\",\"url\":\"https://infogw.api.openshift.com/metrics/v1/receive\"},\"value\":[1684889572.197,\"20.125925925925927\"]}]}}" for query "max without(pod,instance) (rate(prometheus_remote_storage_samples_failed_total{job=\"prometheus-k8s\",url=~\"https://infogw.api.openshift.com.+\"}[5m]))": expecting Prometheus remote write to see no failed samples but got 20.125926

Any failed samples will cause this test to fail. This is perhaps a too strict requirement. We could consider it good enough if some samples are send successfully. The current version tests telemeter behavior on top of CMO behavior.

https://github.com/openshift/cluster-monitoring-operator/pull/1972

Bug OCPBUGS-14762: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-14900: No agetty issue messages are displayed on console

View the Description View the linked PRs

Description of problem:

Using openshift-install v4.13.0, no issue messages are displayed to console.
Looking at /etc/issue.d/, the issues are sent just not displayed by agetty.

# cat /etc/issue.d/70_agent-services.issue
\e{cyan}Waiting for services:\e{reset}
[\e{cyan}start\e{reset}] Service that starts cluster installation

Version-Release number of selected component (if applicable):

4.13

How reproducible:

100%

Steps to Reproduce:

1. Build agent image using openshift-install v4.13.0
2. Mount the ISO and boot a machine
3. Wait for a while until issues are created in /etc/issue.d/

Actual results:

No messages are displayed to console

Expected results:

All messages should be displayed

Additional info:

https://redhat-internal.slack.com/archives/C02SPBZ4GPR/p1686646256441329

https://github.com/openshift/installer/pull/7307

Bug OCPBUGS-15338: Race condition in failure domain mapping with on-delete policy

View the Description View the linked PRs

Description of problem:

We have seen unit tests flaking on the mapping within the OnDelete policy tests for the control plane machine set.

It turns out there is a race condition, and, given the right timing, if a reconcile is in progress while a machine is marked for deletion, the load balancing part of the algorithm fails to properly apply

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/220

Bug OCPBUGS-19305: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-machine-approver/pull/202

Bug OCPBUGS-12964: Bootstrap on aws should have same metadata service type as on other nodes

View the Description View the linked PRs

Description of problem:

While installing ocp on aws user can set metadataService auth to Required in order to use IMDSv2, in that case user requires all the vms to use it. 
Currently bootstrap will always run with Optional and this can be blocked on users aws account and will fail the installation process

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Install aws cluster and set metadataService to Required

Steps to Reproduce:

1.
2.
3.

Actual results:

Bootstrap has IMDSv2 set to optional

Expected results:

All vms had IMDSv2 set to required

Additional info:

https://github.com/openshift/installer/pull/7149

Bug OCPBUGS-14425: Alibaba clusters are TechPreview and should not be upgradeable

View the Description View the linked PRs

Description of problem:

Alibaba clusters were never declared GA. They are still in TechPreview.
We do not allow upgrades between TechPreview clusters in minor streams (eg 4.12 to 4.13)

To allow a future deprecation and removal of the platform, we will prevent upgrades past 4.13.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-22861: Backport removing ICSP IDMS coexistence restrictions from admission

View the Description View the linked PRs

Description of problem:

This bug is for backport ~~OCPNODE-1798~~ to 4.14.z

create an icsp objects using oc create
with this icsp existence, should allow oc create idms, itms on the same cluster

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.create icsp object
2. create idms object  
The ImageDigestMirrorSet "digest-mirror" is invalid: Kind.ImageDigestMirrorSet: Forbidden: can't create ImageDigestMirrorSet when ImageContentSourcePolicy resources exist

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1780

Bug OCPBUGS-10227: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-15100: [GWAPI] The DNS provider failed to ensure the record, invalid value for name (gcp)

View the Description View the linked PRs

Description of problem:

Running through instructions for a smoke test on 4.14, the DNS record is incorrectly created for the Gateway.  It is missing a trailing dot in the dnsName.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.Run through the steps in https://github.com/openshift/network-edge-tools/blob/2fd044d110eb737c94c8b86ea878a130cae0d03e/docs/blogs/EnhancedDevPreviewGatewayAPI/GettingStarted.md until the step "oc get dnsrecord -n openshift-ingress"
2. Check the status of the DNS record: "oc get dnsrecord xxx -n openshift-ingress -ojson | jq .status.zones[].conditions"

Actual results:

The status shows error conditions with a message like 'The DNS provider failed to ensure the record: googleapi: Error 400: Invalid value for ''entity.change.additions[*.gwapi.apps.ci-ln-3vxsgxb-72292.origin-ci-int-gce.dev.rhcloud.com][A].name'': ''*.gwapi.apps.ci-ln-3vxsgxb-72292.origin-ci-int-gce.dev.rhcloud.com'', invalid'

Expected results:

The status of the DNS record should show a successful publishing of the record.

Additional info:

Backport to 4.13.z

Bug OCPBUGS-15299: Create Serverless Function Form is Broken

View the Description View the linked PRs

Description of problem:

Create Serverless Function Form is Broken

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always on Master.

Steps to Reproduce:

1. Go to Add Page
2. Click Create Serverless Function form

Actual results:

Form throwing error.

Expected results:

Form should open and submit

Screenshot of Error: https://drive.google.com/file/d/1uyzGHktfr8tEGWPyYkv9ISYI6BhdnK6f/view?usp=sharing

Additional info:

https://github.com/openshift/console/pull/12926

Bug OCPBUGS-13965: Bump cluster-dns-operator to k8s APIs v0.27

View the Description View the linked PRs

Description of problem:

The current version of openshift/cluster-dns-operator vendors Kubernetes 1.26 packages. OpenShift 4.14 is based on Kubernetes 1.27.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check https://github.com/openshift/cluster-dns-operator/blob/release-4.14/go.mod

Actual results:

Kubernetes packages (k8s.io/api, k8s.io/apimachinery, and k8s.io/client-go) are at version v0.26

Expected results:

Kubernetes packages are at version v0.27.0 or later.

Additional info:

Using old Kubernetes API and client packages brings risk of API compatibility issues.
controller-runtime will need to be bumped to v0.15.0 as well

https://github.com/openshift/cluster-dns-operator/pull/368

Bug OCPBUGS-1626: alertmanager pod restarted once to become ready

View the Description View the linked PRs

Description of problem:

4.12.0-0.nightly-2022-09-20-095559 fresh cluster, alertmanager pod restarted once to become ready, this is a 4.12 regression, we should make sure the /etc/alertmanager/config_out/alertmanager.env.yaml exists first

# oc -n openshift-monitoring get pod
NAME                                                     READY   STATUS    RESTARTS       AGE
alertmanager-main-0                                      6/6     Running   1 (118m ago)   118m
alertmanager-main-1                                      6/6     Running   1 (118m ago)   118m
...

# oc -n openshift-monitoring describe pod alertmanager-main-0 
...
Containers:
  alertmanager:
    Container ID:  cri-o://31b6f3231f5a24fe85188b8b8e26c45b660ebc870ee6915919031519d493d7f8
    Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34003d434c6f07e4af6e7a52e94f703c68e1f881e90939702c764729e2b513aa
    Image ID:      quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34003d434c6f07e4af6e7a52e94f703c68e1f881e90939702c764729e2b513aa
    Ports:         9094/TCP, 9094/UDP
    Host Ports:    0/TCP, 0/UDP
    Args:
      --config.file=/etc/alertmanager/config_out/alertmanager.env.yaml
      --storage.path=/alertmanager
      --data.retention=120h
      --cluster.listen-address=[$(POD_IP)]:9094
      --web.listen-address=127.0.0.1:9093
      --web.external-url=https:/console-openshift-console.apps.qe-daily1-412-0922.qe.azure.devcluster.openshift.com/monitoring
      --web.route-prefix=/
      --cluster.peer=alertmanager-main-0.alertmanager-operated:9094
      --cluster.peer=alertmanager-main-1.alertmanager-operated:9094
      --cluster.reconnect-timeout=5m
      --web.config.file=/etc/alertmanager/web_config/web-config.yaml
    State:       Running
      Started:   Wed, 21 Sep 2022 19:40:14 -0400
    Last State:  Terminated
      Reason:    Error
      Message:   s=2022-09-21T23:40:06.507Z caller=main.go:231 level=info msg="Starting Alertmanager" version="(version=0.24.0, branch=rhaos-4.12-rhel-8, revision=4efb3c1f9bc32ba0cce7dd163a639ca8759a4190)"
ts=2022-09-21T23:40:06.507Z caller=main.go:232 level=info build_context="(go=go1.18.4, user=root@b2df06f7fbc3, date=20220916-18:08:09)"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:260 level=warn component=cluster msg="failed to join cluster" err="2 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated:9094: lookup alertmanager-main-0.alertmanager-operated on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated:9094: lookup alertmanager-main-1.alertmanager-operated on 172.30.0.10:53: no such host\n\n"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:262 level=info component=cluster msg="will retry joining cluster every 10s"
ts=2022-09-21T23:40:07.119Z caller=main.go:329 level=warn msg="unable to join gossip mesh" err="2 errors occurred:\n\t* Failed to resolve alertmanager-main-0.alertmanager-operated:9094: lookup alertmanager-main-0.alertmanager-operated on 172.30.0.10:53: no such host\n\t* Failed to resolve alertmanager-main-1.alertmanager-operated:9094: lookup alertmanager-main-1.alertmanager-operated on 172.30.0.10:53: no such host\n\n"
ts=2022-09-21T23:40:07.119Z caller=cluster.go:680 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
ts=2022-09-21T23:40:07.173Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/etc/alertmanager/config_out/alertmanager.env.yaml
ts=2022-09-21T23:40:07.174Z caller=coordinator.go:118 level=error component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config_out/alertmanager.env.yaml err="open /etc/alertmanager/config_out/alertmanager.env.yaml: no such file or directory"
ts=2022-09-21T23:40:07.174Z caller=cluster.go:689 level=info component=cluster msg="gossip not settled but continuing anyway" polls=0 elapsed=54.469985ms      Exit Code:    1
      Started:      Wed, 21 Sep 2022 19:40:06 -0400
      Finished:     Wed, 21 Sep 2022 19:40:07 -0400
    Ready:          True
    Restart Count:  1
    Requests:
      cpu:     4m
      memory:  40Mi
    Startup:   exec [sh -c exec curl --fail http://localhost:9093/-/ready] delay=20s timeout=3s period=10s #success=1 #failure=40
...

# oc -n openshift-monitoring exec -c alertmanager alertmanager-main-0 -- cat /etc/alertmanager/config_out/alertmanager.env.yaml
"global":
  "resolve_timeout": "5m"
"inhibit_rules":
- "equal":
  - "namespace"
  - "alertname"
  "source_matchers":
  - "severity = critical"
  "target_matchers":
  - "severity =~ warning|info"
- "equal":
  - "namespace"
  - "alertname"

...

Version-Release number of selected component (if applicable):

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-20-095559   True        False         109m    Cluster version is 4.12.0-0.nightly-2022-09-20-095559

How reproducible:

always

Steps to Reproduce:

1. see the steps
2.
3.

Actual results:

alertmanager pod restarted once to become ready

Expected results:

no restart

Additional info:

no issue with 4.11

# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-09-20-140029   True        False         16m     Cluster version is 4.11.0-0.nightly-2022-09-20-140029
# oc -n openshift-monitoring get pod | grep alertmanager-main
alertmanager-main-0                                      6/6     Running   0          54m
alertmanager-main-1                                      6/6     Running   0          55m

Bug OCPBUGS-7966: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/6991

Bug OCPBUGS-22787: [release-4.14] Don't set SSL connection on DBs anymore with OVN-IC

View the Description View the linked PRs

SB and NB containers have this command to expose their DB via SSL and set the inactivity probe interval. With OVN-IC we don't use SSL for the DBs anymore, so we can remove that bit.

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set-connection pssl:.OVN_SB_PORT.LISTEN_DUAL_STACK – set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

should become:

if ! retry 60 "inactivity-probe" "ovn-sbctl --no-leader-only -t 5 set connection . inactivity_probe=.OVN_CONTROLLER_INACTIVITY_PROBE"; then

Also we can clean up the comment at the end where it polls the IPsec status, which is just a way of making sure the DB is ready and answering queries. We dont' need to wait for the cluster to converge (since there's no RAFT) but could change it to:

"Kill some time while DB becomes ready by checking IPsec status"

https://github.com/openshift/cluster-network-operator/pull/2090

Story HOSTEDCP-975: Review NodePool metrics and set some internal SLOs/SLIs

View the Description View the linked PRs

Follow up for https://issues.redhat.com/browse/HOSTEDCP-969

Create metrics and grafana panel in

https://hypershift-monitoring.homelab.sjennings.me:3000/d/PGCTmCL4z/hypershift-slos-slis-alberto-playground?orgId=1&from=now-24h&to=now

https://github.com/openshift/hypershift/tree/main/contrib/metrics

for NodePool internal SLOs/SLIs:

NodePoolDeletionDuration
NodePoolInitialRolloutDuration

Move existing metrics when possible from metrics loop into nodepool controller:

- nodePoolSize

Explore and discuss granular metrics to track NodePool lifecycle bottle necks, infra, ignition, node networking, available. Consolidate that with hostedClusterTransitionSeconds metrics and dashboard panels

Explore and discuss metrics for upgrade duration SLO for both HC and NodePool.

Bug MGMT-13520: Irrelevant validations are shown for unbound agents

View the Description View the linked PRs

Description of the problem:

Some validations are only related to agents that are bound to clusters. We had a case where an agent couldn't be bound due to failing validations, and the irrelevant validations added unnecessary noise. I attached the relevant agent CR to the ticket. You can see in the Conditions:

  - lastTransitionTime: "2023-01-26T21:00:29Z"
    message: 'The agent''s validations are failing: Validation pending - no cluster,Host
      couldn''t synchronize with any NTP server,Missing inventory, or missing cluster'
    reason: ValidationsFailing
    status: "False"
    type: Validated

The only relevant validation is that there is no NTP server. "no cluster" and "Missing inventory, or missing cluster" are misleading.

How reproducible:

100%

Steps to reproduce:

1. Boot an unbound agent

2. Look at the CR

Actual results:

All validations are shown in the CR

Expected results:

Only relevant validations are shown in the CR

https://github.com/openshift/assisted-service/pull/5023

Task MGMT-15073: Delete service generated manifests during "deleteClusterFiles" operation

View the Description View the linked PRs

Based on a suggestion from Omer

"Now that we can tell apart user manifests from our own service manifests, I think it's best that this function deletes the service manifests.

https://github.com/openshift/assisted-service/blob/master/internal/cluster/cluster.go#L1418

The original motivation for this skip was that we didn't want to destroy user uploaded manifests when the user resets their installation, but preserving the service generated ones is useless, and was just an unfortunate side-effect of protecting the user manifests. The service ones would anyway get regenerated when the user hits install again, there's no point in protecting them. If anything, clearing those manifests I think this might solve some edge case bugs I can think of"

We will need to wait for https://github.com/openshift/assisted-service/pull/5278/files to be merged before starting this as this depends on changes made in this PR

https://github.com/openshift/assisted-service/pull/5338

Bug OCPBUGS-17151: Observed a panic: "invalid memory address or nil pointer dereference"

View the Description View the linked PRs

In Hypershift CI, we see nil deref panic

I0801 06:35:38.203019       1 controller.go:182] Assigning key: ip-10-0-132-175.ec2.internal to node workqueue
E0801 06:35:38.567021       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 195 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x28103a0?, 0x47a6400})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00088f260?})
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x28103a0, 0x47a6400})
	/usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*AWS).getSubnet(0xc000c05220, 0xc000d760b0)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/aws.go:266 +0x24a
github.com/openshift/cloud-network-config-controller/pkg/cloudprovider.(*AWS).GetNodeEgressIPConfiguration(0x0?, 0x31b8490?, {0x0, 0x0, 0x0})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/cloudprovider/aws.go:200 +0x185
github.com/openshift/cloud-network-config-controller/pkg/controller/node.(*NodeController).SyncHandler(0xc000d526e0, {0xc00005d7e0, 0x1c})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/node/node_controller.go:129 +0x44f
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem.func1(0xc00071f740, {0x25ff720?, 0xc00088f260?})
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:152 +0x11c
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).processNextWorkItem(0xc00071f740)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:162 +0x46
github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).runWorker(...)
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:113
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x318e140, 0xc0005aa1e0}, 0x1, 0xc0000c4ba0)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
	/go/src/github.com/openshift/cloud-network-config-controller/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x25
created by github.com/openshift/cloud-network-config-controller/pkg/controller.(*CloudNetworkConfigController).Run
	/go/src/github.com/openshift/cloud-network-config-controller/pkg/controller/controller.go:99 +0x3aa
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x236d14a]

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1686255525022404608/artifacts/e2e-aws-ovn/run-e2e/artifacts/TestNodePool_PreTeardownClusterDump/namespaces/e2e-clusters-m222b-example-85hhk/core/pods/logs/cloud-network-config-controller-6984cd6dcb-l7pcx-controller-previous.log

https://github.com/openshift/cloud-network-config-controller/blob/master/pkg/cloudprovider/aws.go#L266

Code does an unprotected deref of `networkInterface.SubnetId` which appears to be `nil`, which is probably why multiple subnets are returned in the first place.

https://github.com/openshift/cloud-network-config-controller/pull/120

Bug OCPBUGS-13209: After custom tolerations of dns pod, the new pod stuck in pending state

View the Description View the linked PRs

Description of problem:

 After custom toleration (tainting the dns pod) on master node the dns pod stuck in pending state

Version-Release number of selected component (if applicable):

How reproducible:

https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-41050

Steps to Reproduce:

1.melvinjoseph@mjoseph-mac Downloads % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-05-03-163151   True        False         4h5m    Cluster version is 4.14.0-0.nightly-2023-05-03-163151
2.check default dns pods placement
melvinjoseph@mjoseph-mac Downloads % ouf5M-5AVBm-Taoxt-aIgPmoc -n openshift-dns get pod -owide
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          4h12m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          4h12m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          4h18m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          4h18m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          4h18m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          4h11m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-5bnjv   1/1     Running   0          4h12m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          4h18m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          4h12m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          4h18m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          4h12m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          4h18m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>

 3.oc -n openshift-dns get ds/dns-default -oyaml
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 1
  name: default
  resourceVersion: "22893"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement: {}
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: All DNS and node-resolver pods are available, and the DNS service has
      a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


4. config custom tolerations of dns pod (to not tolerate master node taints)
 $ oc edit dns.operator default
 spec:
   nodePlacement:
     tolerations:
     - effect: NoExecute
       key: my-dns-test
       operators: Equal
       value: abc
       tolerationSeconds: 3600 
melvinjoseph@mjoseph-mac Downloads % oc edit dns.operator default
Warning: unknown field "spec.nodePlacement.tolerations[0].operators"
dns.operator.openshift.io/default edited
melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod -owide
NAME                  READY   STATUS    RESTARTS   AGE     IP            NODE                                                       NOMINATED NODE   READINESS GATES
dns-default-6cv9k     2/2     Running   0          5h16m   10.131.0.8    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
dns-default-8g2w8     2/2     Running   0          5h16m   10.129.2.5    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
dns-default-df7zj     2/2     Running   0          5h22m   10.128.0.40   shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
dns-default-kmv4c     2/2     Running   0          5h22m   10.130.0.9    shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
dns-default-lxxkt     2/2     Running   0          5h22m   10.129.0.11   shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>
dns-default-mjrnx     2/2     Running   0          5h16m   10.128.2.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
dns-default-xqxr9     0/2     Pending   0          7s      <none>        <none>                                                     <none>           <none>
node-resolver-5bnjv   1/1     Running   0          5h17m   10.0.128.3    shudi-gcp4h-whdkl-worker-a-qnvjw.c.openshift-qe.internal   <none>           <none>
node-resolver-7ns8b   1/1     Running   0          5h22m   10.0.0.4      shudi-gcp4h-whdkl-master-1.c.openshift-qe.internal         <none>           <none>
node-resolver-bz7k5   1/1     Running   0          5h16m   10.0.128.2    shudi-gcp4h-whdkl-worker-c-b8qrq.c.openshift-qe.internal   <none>           <none>
node-resolver-c67mw   1/1     Running   0          5h22m   10.0.0.3      shudi-gcp4h-whdkl-master-2.c.openshift-qe.internal         <none>           <none>
node-resolver-d8h65   1/1     Running   0          5h16m   10.0.128.4    shudi-gcp4h-whdkl-worker-b-scqdh.c.openshift-qe.internal   <none>           <none>
node-resolver-rgb92   1/1     Running   0          5h22m   10.0.0.5      shudi-gcp4h-whdkl-master-0.c.openshift-qe.internal         <none>           <none>


The dns pod stuck in pending state

melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get ds/dns-default -oyaml
<-----snip--->
      tolerations:
      - effect: NoExecute
        key: my-dns-test
        tolerationSeconds: 3600
        value: abc
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: dns-default
        name: config-volume
      - name: metrics-tls
        secret:
          defaultMode: 420
          secretName: dns-default-metrics-tls
  updateStrategy:
    rollingUpdate:
      maxSurge: 10%
      maxUnavailable: 0
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 3
  numberReady: 3
  observedGeneration: 2


melvinjoseph@mjoseph-mac Downloads % oc get dns.operator default -oyaml
apiVersion: operator.openshift.io/v1
kind: DNS
metadata:
  creationTimestamp: "2023-05-08T00:39:00Z"
  finalizers:
  - dns.operator.openshift.io/dns-controller
  generation: 2
  name: default
  resourceVersion: "125435"
  uid: ae53e756-42a3-4c9d-8284-524df006382d
spec:
  cache:
    negativeTTL: 0s
    positiveTTL: 0s
  logLevel: Normal
  nodePlacement:
    tolerations:
    - effect: NoExecute
      key: my-dns-test
      tolerationSeconds: 3600
      value: abc
  operatorLogLevel: Normal
  upstreamResolvers:
    policy: Sequential
    transportConfig: {}
    upstreams:
    - port: 53
      type: SystemResolvConf
status:
  clusterDomain: cluster.local
  clusterIP: 172.30.0.10
  conditions:
  - lastTransitionTime: "2023-05-08T00:46:20Z"
    message: Enough DNS pods are available, and the DNS service has a cluster IP address.
    reason: AsExpected
    status: "False"
    type: Degraded
  - lastTransitionTime: "2023-05-08T06:01:52Z"
    message: Have 0 up-to-date DNS pods, want 3.
    reason: Reconciling
    status: "True"
    type: Progressing
  - lastTransitionTime: "2023-05-08T00:39:25Z"
    message: The DNS daemonset has available pods, and the DNS service has a cluster
      IP address.
    reason: AsExpected
    status: "True"
    type: Available
  - lastTransitionTime: "2023-05-08T00:39:01Z"
    message: DNS Operator can be upgraded
    reason: AsExpected
    status: "True"
    type: Upgradeable


melvinjoseph@mjoseph-mac Downloads % oc -n openshift-dns get pod                  
NAME                  READY   STATUS    RESTARTS   AGE
dns-default-6cv9k     2/2     Running   0          5h18m
dns-default-8g2w8     2/2     Running   0          5h18m
dns-default-df7zj     2/2     Running   0          5h25m
dns-default-kmv4c     2/2     Running   0          5h25m
dns-default-lxxkt     2/2     Running   0          5h25m
dns-default-mjrnx     2/2     Running   0          5h18m
dns-default-xqxr9     0/2     Pending   0          2m12s
node-resolver-5bnjv   1/1     Running   0          5h19m
node-resolver-7ns8b   1/1     Running   0          5h25m
node-resolver-bz7k5   1/1     Running   0          5h19m
node-resolver-c67mw   1/1     Running   0          5h25m
node-resolver-d8h65   1/1     Running   0          5h19m
node-resolver-rgb92   1/1     Running   0          5h25m

Actual results:

The dns pod dns-default-xqxr9  stuck in pending state

Expected results:

There will be reloaded DNS pods

Additional info:

melvinjoseph@mjoseph-mac Downloads % oc describe po/dns-default-xqxr9  -n openshift-dns
Name:                 dns-default-xqxr9
Namespace:            openshift-dns
Priority:             2000001000


<----snip--->
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 my-dns-test=abc:NoExecute for 3600s
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason            Age    From               Message
  ----     ------            ----   ----               -------
  Warning  FailedScheduling  3m45s  default-scheduler  0/6 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/6 nodes are available: 1 Preemption is not helpful for scheduling, 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 3 node(s) didn't match Pod's node affinity/selector..

https://github.com/openshift/cluster-dns-operator/pull/379

Bug OCPBUGS-18934: CBO crashes if internal IP is nil

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17589~~. The following is the description of the original issue:
—
This bug has been seen during the analysis of another issue

If the Server Internal IP is not defined, CBO crashes as nil is not handled in https://github.com/openshift/cluster-baremetal-operator/blob/release-4.12/provisioning/utils.go#L99

I0809 17:33:09.683265       1 provisioning_controller.go:540] No Machines with cluster-api-machine-role=master found, set provisioningMacAddresses if the metal3 pod fails to start

I0809 17:33:09.690304       1 clusteroperator.go:217] "new CO status" reason=SyncingResources processMessage="Applying metal3 resources" message=""

I0809 17:33:10.488862       1 recorder_logging.go:37] &Event{ObjectMeta:{dummy.1779c769624884f4  dummy    0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] []  []},InvolvedObject:ObjectReference{Kind:Pod,Namespace:dummy,Name:dummy,UID:,APIVersion:v1,ResourceVersion:,FieldPath:,},Reason:ValidatingWebhookConfigurationUpdated,Message:Updated ValidatingWebhookConfiguration.admissionregistration.k8s.io/baremetal-operator-validating-webhook-configuration because it changed,Source:EventSource{Component:,Host:,},FirstTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,LastTimestamp:2023-08-09 17:33:10.488745204 +0000 UTC m=+5.906952556,Count:1,Type:Normal,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}

panic: runtime error: invalid memory address or nil pointer dereference

[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1768fd4]

 

goroutine 574 [running]:

github.com/openshift/cluster-baremetal-operator/provisioning.getServerInternalIP({0x1e774d0?, 0xc0001e8fd0?})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:75 +0x154

github.com/openshift/cluster-baremetal-operator/provisioning.GetIronicIP({0x1ea2378?, 0xc000856840?}, {0x1bc1f91, 0x15}, 0xc0004c4398, {0x1e774d0, 0xc0001e8fd0})

        /go/src/github.com/openshift/cluster-baremetal-operator/provisioning/utils.go:98 +0xfb

https://github.com/openshift/cluster-baremetal-operator/pull/360

Bug OCPBUGS-5059: duplicate entry in spec.plugins will cause console panic

View the Description View the linked PRs

Description of problem:

console will have panic error when duplicate entry is set in spec.plugins

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2022-12-19-122634

How reproducible:

Always

Steps to Reproduce:

1. Create console-demo-plugin manifests
$ oc apply -f dynamic-demo-plugin/oc-manifest.yaml 
namespace/console-demo-plugin created
deployment.apps/console-demo-plugin created
service/console-demo-plugin created
consoleplugin.console.openshift.io/console-demo-plugin created 
2.Enable console-demo-plugin
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin"] } }' --type=merge 
console.operator.openshift.io/cluster patched
3. Add a duplicate entry in spec.plugins in consoles.operator/cluster 
$ oc patch consoles.operator.openshift.io cluster --patch '{ "spec": { "plugins": ["console-demo-plugin", "console-demo-plugin"] } }' --type=merge  console.operator.openshift.io/cluster patched
$ oc get consoles.operator cluster -o json | jq .spec.plugins
[
  "console-demo-plugin",
  "console-demo-plugin"
]
4. check console pods status
$ oc get pods -n openshift-console                        
NAME                         READY   STATUS             RESTARTS      AGE
console-6bcc87c7b4-6g2cf     0/1     CrashLoopBackOff   1 (21s ago)   50s
console-6bcc87c7b4-9g6kk     0/1     CrashLoopBackOff   3 (3s ago)    50s
console-7dc78ffd78-sxvcv     1/1     Running            0             2m58s
downloads-758fc74758-9k426   1/1     Running            0             3h18m
downloads-758fc74758-k4q72   1/1     Running            0             3h21m

Actual results:

3. console pods will be in CrashLoopBackOff status
$ oc logs console-6bcc87c7b4-9g6kk -n openshift-console
W1220 06:48:37.279871       1 main.go:228] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I1220 06:48:37.279889       1 main.go:238] The following console plugins are enabled:
I1220 06:48:37.279898       1 main.go:240]  - console-demo-plugin
I1220 06:48:37.279911       1 main.go:354] cookies are secure!
I1220 06:48:37.331802       1 server.go:607] The following console endpoints are now proxied to these services:
I1220 06:48:37.331843       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
I1220 06:48:37.331884       1 server.go:610]  - /api/proxy/plugin/console-demo-plugin/thanos-querier/ -> https://thanos-querier.openshift-monitoring.svc.cluster.local:9091
panic: http: multiple registrations for /api/proxy/plugin/console-demo-plugin/thanos-querier/goroutine 1 [running]:
net/http.(*ServeMux).Handle(0xc0005b6600, {0xc0005d9a40, 0x35}, {0x35aaf60?, 0xc000735260})
    /usr/lib/golang/src/net/http/server.go:2503 +0x239
github.com/openshift/console/pkg/server.(*Server).HTTPHandler.func1({0xc0005d9940?, 0x35?}, {0x35aaf60, 0xc000735260})
    /go/src/github.com/openshift/console/pkg/server/server.go:245 +0x149
github.com/openshift/console/pkg/server.(*Server).HTTPHandler(0xc000056c00)
    /go/src/github.com/openshift/console/pkg/server/server.go:621 +0x330b
main.main()
    /go/src/github.com/openshift/console/cmd/bridge/main.go:785 +0x5ff5

Expected results:

3. console pods should be running well

Additional info:

https://github.com/openshift/console-operator/pull/710

Bug OCPBUGS-18257: API VIP stuck on node with inaccessible API

View the Description View the linked PRs

Description of problem:

The fix for https://issues.redhat.com/browse/OCPBUGS-15947 seems to have introduced a problem in our keepalived-monitor logic. What I'm seeing is that at some point all of the apiservers became unavailable, which caused haproxy-monitor to drop the redirect firewall rule since it wasn't able to reach the API and we normally want to fall back to direct, un-loadbalanced API connectivity in that case.

However, due to the fix linked above we now short-circuit the keepalived-monitor update loop if we're unable to retrieve the node list, which is what will happen if the node holding the VIP has neither a local apiserver nor the HAProxy firewall rule. Because of this we will also skip updating the status of the firewall rule and thus the keepalived priority for the node won't be dropped appropriately.

Version-Release number of selected component (if applicable):

We backported the fix linked above to 4.11 so I expect this goes back at least that far.

How reproducible:

Unsure. It's clearly not happening every time, but I have a local dev cluster in this state so it can happen.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

I think the solution here is just to move the firewall rule check earlier in the update loop so it will have run before we try to retrieve nodes. There's no dependency on the ordering of those two steps so I don't foresee any major issues.

To workaround this I believe we can just bounce keepalived on the affected node until the VIP ends up on the node with a local apiserver.

https://github.com/openshift/baremetal-runtimecfg/pull/270

Bug OCPBUGS-19747: Do not use port 9106 for ovnkube-control-plane metrics

View the Description View the linked PRs

In order to avoid possible issues with SDN during migration from SDN to OVNK, do not use port 9106 for ovnkube-control-plane metrics, since it's already used by SDN. Use a port that is not used by SDN, such as 9108.

https://github.com/openshift/cluster-network-operator/pull/2033

Bug OCPBUGS-19808: Rollout of ovnk pods is taking more time

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17391~~. The following is the description of the original issue:
—
the pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration job started failing recently when the
ovnkube-master daemonset would not finish rolling out after 360s.

taking the must gather to debug which happens a few minutes after the test
failure you can see that the daemonset is still not ready, so I believe that
increasing the timeout is not the answer.

some debug info:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk get daemonsets -A 
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
openshift-cluster-csi-drivers aws-ebs-csi-driver-node 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-cluster-node-tuning-operator tuned 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns dns-default 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-dns node-resolver 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-image-registry node-ca 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-ingress-canary ingress-canary 3 3 3 3 3 kubernetes.io/os=linux 8h
openshift-machine-api machine-api-termination-handler 0 0 0 0 0 kubernetes.io/os=linux,machine.openshift.io/interruptible-instance= 8h
openshift-machine-config-operator machine-config-daemon 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-machine-config-operator machine-config-server 3 3 3 3 3 node-role.kubernetes.io/master= 8h
openshift-monitoring node-exporter 6 6 6 6 6 kubernetes.io/os=linux 8h
openshift-multus multus 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus multus-additional-cni-plugins 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-multus network-metrics-daemon 6 6 6 6 6 kubernetes.io/os=linux 9h
openshift-network-diagnostics network-check-target 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
openshift-ovn-kubernetes ovnkube-master 3 3 2 2 2 beta.kubernetes.io/os=linux,node-role.kubernetes.io/master= 9h
openshift-ovn-kubernetes ovnkube-node 6 6 6 6 6 beta.kubernetes.io/os=linux 9h
Name: ovnkube-master
Selector: app=ovnkube-master
Node-Selector: beta.kubernetes.io/os=linux,node-role.kubernetes.io/master=
Labels: networkoperator.openshift.io/generates-operator-status=stand-alone
Annotations: deprecated.daemonset.template.generation: 3
kubernetes.io/description: This daemonset launches the ovn-kubernetes controller (master) networking components.
networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
release.openshift.io/version: 4.14.0-0.ci.test-2023-08-04-123014-ci-op-c6fp05f4-latest
Desired Number of Nodes Scheduled: 3
Current Number of Nodes Scheduled: 3
Number of Nodes Scheduled with Up-to-date Pods: 2
Number of Nodes Scheduled with Available Pods: 2
Number of Nodes Misscheduled: 0
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=ovnkube-master
component=network
kubernetes.io/os=linux
openshift.io/component=network
ovn-db-pod=true
type=infra
Annotations: networkoperator.openshift.io/cluster-network-cidr: 10.128.0.0/14
networkoperator.openshift.io/hybrid-overlay-status: disabled
networkoperator.openshift.io/ip-family-mode: single-stack
target.workload.openshift.io/management:
{"effect": "PreferredDuringScheduling"}
Service Account: ovn-kubernetes-controller

it seems there is one pod that is not coming up all the way and that pod has
two containers not ready (sbdb and nbdb). logs from those containers below:

➜ static-kas git:(master) oc --kubeconfig=/tmp/kk describe pod ovnkube-master-7qlm5 -n openshift-ovn-kubernetes | rg '^ [a-z].*:|Ready'
northd:
Ready: True
nbdb:
Ready: False
kube-rbac-proxy:
Ready: True
sbdb:
Ready: False
ovnkube-master:
Ready: True
ovn-dbchecker:
Ready: True
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c sbdb
2023-08-04T13:08:49.127480354Z + [[ -f /env/_master ]]
2023-08-04T13:08:49.127562165Z + trap quit TERM INT
2023-08-04T13:08:49.127609496Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:49.127637926Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.127637926Z + transport=ssl
2023-08-04T13:08:49.127645167Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:49.127682687Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:49.127690638Z + db=sb
2023-08-04T13:08:49.127690638Z + db_port=9642
2023-08-04T13:08:49.127712038Z + ovn_db_file=/etc/ovn/ovnsb_db.db
2023-08-04T13:08:49.127854181Z + [[ ! ssl:10.0.102.2:9642,ssl:10.0.42.108:9642,ssl:10.0.74.128:9642 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:49.128199437Z ++ bracketify 10.0.42.108
2023-08-04T13:08:49.128237768Z ++ case "$1" in
2023-08-04T13:08:49.128265838Z ++ echo 10.0.42.108
2023-08-04T13:08:49.128493242Z + OVN_ARGS='--db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:49.128535253Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.128819438Z ++ date -Iseconds
2023-08-04T13:08:49.130157063Z 2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:49.130170893Z + echo '2023-08-04T13:08:49+00:00 - starting sbdb CLUSTER_INITIATOR_IP=10.0.102.2'
2023-08-04T13:08:49.130170893Z + initialize=false
2023-08-04T13:08:49.130179713Z + [[ ! -e /etc/ovn/ovnsb_db.db ]]
2023-08-04T13:08:49.130318475Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:49.130406657Z + wait 9
2023-08-04T13:08:49.130493659Z + exec /usr/share/ovn/scripts/ovn-ctl -db-sb-cluster-local-port=9644 --db-sb-cluster-local-addr=10.0.42.108 --no-monitor --db-sb-cluster-local-proto=ssl --ovn-sb-db-ssl-key=/ovn-cert/tls.key --ovn-sb-db-ssl-cert=/ovn-cert/tls.crt --ovn-sb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-sb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_sb_ovsdb
2023-08-04T13:08:49.208399304Z 2023-08-04T13:08:49.208Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-sb.log
2023-08-04T13:08:49.213507987Z ovn-sbctl: unix:/var/run/ovn/ovnsb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:49.224890005Z 2023-08-04T13:08:49Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:49.224912156Z 2023-08-04T13:08:49Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:49.255474964Z 2023-08-04T13:08:49.255Z|00002|raft|INFO|local server ID is 7f92
2023-08-04T13:08:49.333342909Z 2023-08-04T13:08:49.333Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:49.348948944Z 2023-08-04T13:08:49.348Z|00004|reconnect|INFO|ssl:10.0.102.2:9644: connecting...
2023-08-04T13:08:49.349002565Z 2023-08-04T13:08:49.348Z|00005|reconnect|INFO|ssl:10.0.74.128:9644: connecting...
2023-08-04T13:08:49.352510569Z 2023-08-04T13:08:49.352Z|00006|reconnect|INFO|ssl:10.0.102.2:9644: connected
2023-08-04T13:08:49.353870484Z 2023-08-04T13:08:49.353Z|00007|reconnect|INFO|ssl:10.0.74.128:9644: connected
2023-08-04T13:08:49.889326777Z 2023-08-04T13:08:49.889Z|00008|raft|INFO|server 2501 is leader for term 5
2023-08-04T13:08:49.890316765Z 2023-08-04T13:08:49.890Z|00009|raft|INFO|rejecting append_request because previous entry 5,1538 not in local log (mismatch past end of log)
2023-08-04T13:08:49.891199951Z 2023-08-04T13:08:49.891Z|00010|raft|INFO|rejecting append_request because previous entry 5,1539 not in local log (mismatch past end of log)
2023-08-04T13:08:50.225632838Z 2023-08-04T13:08:50Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connecting...
2023-08-04T13:08:50.225677739Z 2023-08-04T13:08:50Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnsb_db.sock: connected
2023-08-04T13:08:50.227772827Z Waiting for OVN_Southbound to come up.
2023-08-04T13:08:55.716284614Z 2023-08-04T13:08:55.716Z|00011|raft|INFO|ssl:10.0.74.128:43498: learned server ID 3dff
2023-08-04T13:08:55.716323395Z 2023-08-04T13:08:55.716Z|00012|raft|INFO|ssl:10.0.74.128:43498: learned remote address ssl:10.0.74.128:9644
2023-08-04T13:08:55.724570375Z 2023-08-04T13:08:55.724Z|00013|raft|INFO|ssl:10.0.102.2:47804: learned server ID 2501
2023-08-04T13:08:55.724599466Z 2023-08-04T13:08:55.724Z|00014|raft|INFO|ssl:10.0.102.2:47804: learned remote address ssl:10.0.102.2:9644
2023-08-04T13:08:59.348572779Z 2023-08-04T13:08:59.348Z|00015|memory|INFO|32296 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:59.348648190Z 2023-08-04T13:08:59.348Z|00016|memory|INFO|atoms:35959 cells:31476 monitors:0 n-weak-refs:749 raft-connections:4 raft-log:1543 txn-history:100 txn-history-atoms:7100
➜ static-kas git:(master) oc --kubeconfig=/tmp/kk logs ovnkube-master-7qlm5 -n openshift-ovn-kubernetes -c nbdb 
2023-08-04T13:08:48.779743434Z + [[ -f /env/_master ]]
2023-08-04T13:08:48.779743434Z + trap quit TERM INT
2023-08-04T13:08:48.779825516Z + ovn_kubernetes_namespace=openshift-ovn-kubernetes
2023-08-04T13:08:48.779825516Z + ovndb_ctl_ssl_opts='-p /ovn-cert/tls.key -c /ovn-cert/tls.crt -C /ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.779825516Z + transport=ssl
2023-08-04T13:08:48.779825516Z + ovn_raft_conn_ip_url_suffix=
2023-08-04T13:08:48.779825516Z + [[ 10.0.42.108 == \: ]]
2023-08-04T13:08:48.779825516Z + db=nb
2023-08-04T13:08:48.779825516Z + db_port=9641
2023-08-04T13:08:48.779825516Z + ovn_db_file=/etc/ovn/ovnnb_db.db
2023-08-04T13:08:48.779887606Z + [[ ! ssl:10.0.102.2:9641,ssl:10.0.42.108:9641,ssl:10.0.74.128:9641 =~ .:10\.0\.42\.108:. ]]
2023-08-04T13:08:48.780159182Z ++ bracketify 10.0.42.108
2023-08-04T13:08:48.780167142Z ++ case "$1" in
2023-08-04T13:08:48.780172102Z ++ echo 10.0.42.108
2023-08-04T13:08:48.780314224Z + OVN_ARGS='--db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt'
2023-08-04T13:08:48.780314224Z + CLUSTER_INITIATOR_IP=10.0.102.2
2023-08-04T13:08:48.780518588Z ++ date -Iseconds
2023-08-04T13:08:48.781738820Z 2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108
2023-08-04T13:08:48.781753021Z + echo '2023-08-04T13:08:48+00:00 - starting nbdb CLUSTER_INITIATOR_IP=10.0.102.2, K8S_NODE_IP=10.0.42.108'
2023-08-04T13:08:48.781753021Z + initialize=false
2023-08-04T13:08:48.781753021Z + [[ ! -e /etc/ovn/ovnnb_db.db ]]
2023-08-04T13:08:48.781816342Z + [[ false == \t\r\u\e ]]
2023-08-04T13:08:48.781936684Z + wait 9
2023-08-04T13:08:48.781974715Z + exec /usr/share/ovn/scripts/ovn-ctl -db-nb-cluster-local-port=9643 --db-nb-cluster-local-addr=10.0.42.108 --no-monitor --db-nb-cluster-local-proto=ssl --ovn-nb-db-ssl-key=/ovn-cert/tls.key --ovn-nb-db-ssl-cert=/ovn-cert/tls.crt --ovn-nb-db-ssl-ca-cert=/ovn-ca/ca-bundle.crt '-ovn-nb-log=-vconsole:info -vfile:off -vPATTERN:console:%D
{%Y-%m-%dT%H:%M:%S.###Z}
|%05N|%c%T|%p|%m' run_nb_ovsdb
2023-08-04T13:08:48.851644059Z 2023-08-04T13:08:48.851Z|00001|vlog|INFO|opened log file /var/log/ovn/ovsdb-server-nb.log
2023-08-04T13:08:48.852091247Z ovn-nbctl: unix:/var/run/ovn/ovnnb_db.sock: database connection failed (No such file or directory)
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00001|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:48.861365357Z 2023-08-04T13:08:48Z|00002|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connection attempt failed (No such file or directory)
2023-08-04T13:08:48.875126148Z 2023-08-04T13:08:48.875Z|00002|raft|INFO|local server ID is c503
2023-08-04T13:08:48.911846610Z 2023-08-04T13:08:48.911Z|00003|ovsdb_server|INFO|ovsdb-server (Open vSwitch) 3.1.2
2023-08-04T13:08:48.918864408Z 2023-08-04T13:08:48.918Z|00004|reconnect|INFO|ssl:10.0.102.2:9643: connecting...
2023-08-04T13:08:48.918934490Z 2023-08-04T13:08:48.918Z|00005|reconnect|INFO|ssl:10.0.74.128:9643: connecting...
2023-08-04T13:08:48.923439162Z 2023-08-04T13:08:48.923Z|00006|reconnect|INFO|ssl:10.0.102.2:9643: connected
2023-08-04T13:08:48.925166154Z 2023-08-04T13:08:48.925Z|00007|reconnect|INFO|ssl:10.0.74.128:9643: connected
2023-08-04T13:08:49.861650961Z 2023-08-04T13:08:49Z|00003|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connecting...
2023-08-04T13:08:49.861747153Z 2023-08-04T13:08:49Z|00004|reconnect|INFO|unix:/var/run/ovn/ovnnb_db.sock: connected
2023-08-04T13:08:49.875272530Z 2023-08-04T13:08:49.875Z|00008|raft|INFO|server fccb is leader for term 6
2023-08-04T13:08:49.875302480Z 2023-08-04T13:08:49.875Z|00009|raft|INFO|rejecting append_request because previous entry 6,1732 not in local log (mismatch past end of log)
2023-08-04T13:08:49.876027164Z Waiting for OVN_Northbound to come up.
2023-08-04T13:08:55.694760761Z 2023-08-04T13:08:55.694Z|00010|raft|INFO|ssl:10.0.74.128:57122: learned server ID d382
2023-08-04T13:08:55.694800872Z 2023-08-04T13:08:55.694Z|00011|raft|INFO|ssl:10.0.74.128:57122: learned remote address ssl:10.0.74.128:9643
2023-08-04T13:08:55.706904913Z 2023-08-04T13:08:55.706Z|00012|raft|INFO|ssl:10.0.102.2:43230: learned server ID fccb
2023-08-04T13:08:55.706931733Z 2023-08-04T13:08:55.706Z|00013|raft|INFO|ssl:10.0.102.2:43230: learned remote address ssl:10.0.102.2:9643
2023-08-04T13:08:58.919567770Z 2023-08-04T13:08:58.919Z|00014|memory|INFO|21944 kB peak resident set size after 10.1 seconds
2023-08-04T13:08:58.919643762Z 2023-08-04T13:08:58.919Z|00015|memory|INFO|atoms:8471 cells:7481 monitors:0 n-weak-refs:200 raft-connections:4 raft-log:1737 txn-history:72 txn-history-atoms:8165
➜ static-kas git:(master)

This seems to happen very frequently now, but was not happening before around July 21st.

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-local-to-shared-gateway-mode-migration?buildId=1684628739427667968

https://github.com/openshift/cluster-network-operator/pull/2036

Bug OCPBUGS-8447: The MCO will not be able to downgrade from 4.14 to 4.13 due to ignition spec issues

View the Description View the linked PRs

Description of problem:

The MCO must have compatibility in place one OCP version in advance if we want to bump ignition spec version, otherwise downgrades will fail.

This is NOT needed in 4.14, only 4.13

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. None atm, this is preventative for the future
2.
3.

Actual results:

N/A

Expected results:

N/A

Additional info:

https://github.com/openshift/machine-config-operator/pull/3576

Bug OCPBUGS-16496: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/90

Bug OCPBUGS-24037: [4.14] CNO fails to apply ovnkube-master daemonset during upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22293~~. The following is the description of the original issue:
—
Description of problem:

Upgrading from 4.13.5 to 4.13.17 fails at network operator upgrade

Version-Release number of selected component (if applicable):

How reproducible:

Not sure since we only had one cluster on 4.13.5.

Steps to Reproduce:

1. Have a cluster on version 4.13.5 witn ovn kubernetes
2. Set desired update image to quay.io/openshift-release-dev/ocp-release@sha256:c1f2fa2170c02869484a4e049132128e216a363634d38abf292eef181e93b692
3. Wait until it reaches network operator

Actual results:

Error message: Error while updating operator configuration: could not apply (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: failed to apply / update (apps/v1, Kind=DaemonSet) openshift-ovn-kubernetes/ovnkube-master: DaemonSet.apps "ovnkube-master" is invalid: [spec.template.spec.containers[1].lifecycle.preStop: Required value: must specify a handler type, spec.template.spec.containers[3].lifecycle.preStop: Required value: must specify a handler type]

Expected results:

Network operator upgrades successfully

Additional info:

Since I'm not able to attach files please gather all required debug data from https://access.redhat.com/support/cases/#/case/03645170

https://github.com/openshift/cluster-network-operator/pull/2112

Bug OCPBUGS-24576: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/image-customization-controller/pull/111

Bug OCPBUGS-12346: Update 4.14 ose-haproxy-router-base image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/router/pull/473

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/router/pull/473

Bug OCPBUGS-11702: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3441

Bug OCPBUGS-15158: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/528

Bug OCPBUGS-15893: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-16403: Update Cluster Sample Operator dependencies and libraries for OCP 4.14

View the Description View the linked PRs

Description of problem:

We need to update the operator to be synced with the K8 api version used by OCP 4.14. We also need to sync our samples libraries with latest available libraries. Any deprecated libraries should be removed as well.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-samples-operator/pull/511

Bug OCPBUGS-20790: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/587

Bug OCPBUGS-17090: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1164

Bug OCPBUGS-20527: Mirroring a manifest-list-based release payload with --to-image-stream uses Legacy importMode and does not honor --keep-manifest-list

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20474~~. The following is the description of the original issue:
—
Description of problem:

When mirroring a multiarch release payload through oc adm release mirror --keep-manifest-list --to-image-stream into an image stream of a cluster's internal registry, the cluster does not import the image as a manifest list.

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Always

Steps to Reproduce:

1. oc adm release mirror \
                  --from=quay.io/openshift-release-dev/ocp-release:4.14.0-rc.5-multi \
                  --to-image-stream=release \
                  --keep-manifest-list=true
2. oc get istag release:installer -o yaml
3.

Actual results:

apiVersion: image.openshift.io/v1
generation: 1
image:
  dockerImageLayers:
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:97da74cc6d8fa5d1634eb1760fd1da5c6048619c264c23e62d75f3bf6b8ef5c4
    size: 79524639
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:d8190195889efb5333eeec18af9b6c82313edd4db62989bd3a357caca4f13f0e
    size: 1438
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:09c3f3b6718f2df2ee9cd3a6c2e19ddb73ca777f216d310eaf4e0420407ea7c7
    size: 59044444
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:cf84754d71b4b704c30abd45668882903e3eaa1355857b605e1dbb25ecf516d7
    size: 11455659
  - mediaType: application/vnd.docker.image.rootfs.diff.tar.gzip
    name: sha256:2e20a50f4b685b3976028637f296ae8839c18a9505b5f58d6e4a0f03984ef1e8
    size: 433281528
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.v2+json
  dockerImageMetadata:
    Architecture: amd64
    Config:
      Entrypoint:
      - /bin/openshift-install
      Env:
      - container=oci
      - GODEBUG=x509ignoreCN=0,madvdontneed=1
      - __doozer=merge
      - BUILD_RELEASE=202310100645.p0.gc926532.assembly.stream
      - BUILD_VERSION=v4.15.0
      - OS_GIT_MAJOR=4
      - OS_GIT_MINOR=15
      - OS_GIT_PATCH=0
      - OS_GIT_TREE_STATE=clean
      - OS_GIT_VERSION=4.15.0-202310100645.p0.gc926532.assembly.stream-c926532
      - SOURCE_GIT_TREE_STATE=clean
      - __doozer_group=openshift-4.15
      - __doozer_key=ose-installer
      - OS_GIT_COMMIT=c926532
      - SOURCE_DATE_EPOCH=1696907019
      - SOURCE_GIT_COMMIT=c926532cd50b6ef4974f14dfe3d877a0f7707972
      - SOURCE_GIT_TAG=agent-installer-v4.11.0-dev-preview-2-2165-gc926532cd5
      - SOURCE_GIT_URL=https://github.com/openshift/installer
      - PATH=/bin
      - HOME=/output
      Labels:
        License: GPLv2+
        architecture: x86_64
        build-date: 2023-10-10T10:01:18
        com.redhat.build-host: cpt-1001.osbs.prod.upshift.rdu2.redhat.com
        com.redhat.component: ose-installer-container
        com.redhat.license_terms: https://www.redhat.com/agreements
        description: This is the base image from which all OpenShift Container Platform
          images inherit.
        distribution-scope: public
        io.buildah.version: 1.29.0
        io.k8s.description: This is the base image from which all OpenShift Container
          Platform images inherit.
        io.k8s.display-name: OpenShift Container Platform RHEL 8 Base
        io.openshift.build.commit.id: c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.commit.url: https://github.com/openshift/installer/commit/c926532cd50b6ef4974f14dfe3d877a0f7707972
        io.openshift.build.source-location: https://github.com/openshift/installer
        io.openshift.expose-services: ""
        io.openshift.maintainer.component: Installer / openshift-installer
        io.openshift.maintainer.project: OCPBUGS
        io.openshift.release.operator: "true"
        io.openshift.tags: openshift,base
        maintainer: Red Hat, Inc.
        name: openshift/ose-installer
        release: 202310100645.p0.gc926532.assembly.stream
        summary: Provides the latest release of the Red Hat Extended Life Base Image.
        url: https://access.redhat.com/containers/#/registry.access.redhat.com/openshift/ose-installer/images/v4.15.0-202310100645.p0.gc926532.assembly.stream
        vcs-ref: d40a2800e169f6c2d63897467af22d59933e8811
        vcs-type: git
        vendor: Red Hat, Inc.
        version: v4.15.0
      User: 1000:1000
      WorkingDir: /output
    ContainerConfig: {}
    Created: "2023-10-10T10:59:36Z"
    Id: sha256:ae4c47d3c08de5d57b5d4fa8a30497ac097c05abab4e284c91eae389e512f202
    Size: 583326767
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
  metadata:
    annotations:
      image.openshift.io/dockerLayersOrder: ascending
    creationTimestamp: "2023-10-11T10:56:53Z"
    name: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    resourceVersion: "740341"
    uid: 17dede63-ca3b-47ad-a157-c78f38c1df7d
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: Legacy
  name: installer
  referencePolicy:
    type: Source

Expected results:

apiVersion: image.openshift.io/v1
generation: 12
image:
  dockerImageManifestMediaType: application/vnd.docker.distribution.manifest.list.v2+json
  dockerImageManifests:
  - architecture: amd64
    digest: sha256:67d35b2185c9f267523f86e54f403d0d2561c9098b7bb81fa3bfd6fd8a121d04
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: arm64
    digest: sha256:a602c3e4b5f8f747b2813ed2166f366417f638fc6884deecebdb04e18431fcd6
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: ppc64le
    digest: sha256:04296057a8f037f20d4b1ca20bcaac5bdca5368cdd711a3f37bd05d66c9fdaec
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  - architecture: s390x
    digest: sha256:5fda4ea09bfd2026b7d6acd80441b2b7c51b1cf440fd46e0535a7320b67894fb
    manifestSize: 1087
    mediaType: application/vnd.docker.distribution.manifest.v2+json
    os: linux
  dockerImageMetadata:
    ContainerConfig: {}
    Created: "2023-10-12T09:32:03Z"
    Id: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    apiVersion: image.openshift.io/1.0
    kind: DockerImage
  dockerImageMetadataVersion: "1.0"
  dockerImageReference: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  metadata:
    creationTimestamp: "2023-10-12T09:32:10Z"
    name: sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
    resourceVersion: "1327949"
    uid: 4d78c9ba-12b2-414f-a173-b926ae019ab0
kind: ImageStreamTag
lookupPolicy:
  local: true
metadata:
  creationTimestamp: "2023-10-12T09:32:10Z"
  name: release:installer
  namespace: okd-fcos
  resourceVersion: "1329147"
  uid: d6cfcd4d-3f9c-4bb1-bc56-04bf5e926628
tag:
  annotations: null
  from:
    kind: DockerImage
    name: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c510f0e2bd29f7b9bf45146fbc212e893634179cc029cd54a135f05f9ae1df52
  generation: 12
  importPolicy:
    importMode: PreserveOriginal
  name: installer
  referencePolicy:
    type: Source

Additional info:

https://github.com/openshift/oc/pull/1574

Bug OCPBUGS-7921: Instance shouldn't be moved back from f to a

View the Description View the linked PRs

Description of problem:

Tested on gcp, there are 4 failureDomains a, b, c, f in CPMS, remove one a, a new master will be created in f. If readd f to CPMS, instance will be moved back from f to a

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Before update cpms.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f
$ oc get machine                  
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h4m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   90m
zhsungcp22-4glmq-master-plch8-1   Running   n2-standard-4   us-central1   us-central1-a   11m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h

1. Delete failureDomain "zone: us-central1-a" in cpms, new machine Running in zone f.
      failureDomains:
        gcp:
        - zone: us-central1-b
        - zone: us-central1-c
        - zone: us-central1-f 
$ oc get machine              
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h19m
zhsungcp22-4glmq-master-b7pdl-1   Running   n2-standard-4   us-central1   us-central1-f   13m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   106m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h16m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h16m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h16m
2. Add failureDomain "zone: us-central1-a" again, new machine running in zone a, the machine in zone f will be deleted.
      failureDomains:
        gcp:
        - zone: us-central1-a
        - zone: us-central1-f
        - zone: us-central1-c
        - zone: us-central1-b
$ oc get machine                          
NAME                              PHASE     TYPE            REGION        ZONE            AGE
zhsungcp22-4glmq-master-2         Running   n2-standard-4   us-central1   us-central1-c   3h35m
zhsungcp22-4glmq-master-5kltp-1   Running   n2-standard-4   us-central1   us-central1-a   12m
zhsungcp22-4glmq-master-hzsf2-0   Running   n2-standard-4   us-central1   us-central1-b   121m
zhsungcp22-4glmq-worker-a-cxf5w   Running   n2-standard-4   us-central1   us-central1-a   3h32m
zhsungcp22-4glmq-worker-b-d5vzm   Running   n2-standard-4   us-central1   us-central1-b   3h32m
zhsungcp22-4glmq-worker-c-4d897   Running   n2-standard-4   us-central1   us-central1-c   3h32m

Actual results:

Instance is moved back from f to a

Expected results:

Instance shouldn't be moved back from f to a

Additional info:

https://issues.redhat.com//browse/OCPBUGS-7366

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/196

Story MCO-640: Move all log functions to klog

View the Description View the linked PRs

https://github.com/kubernetes/klog is the favored fork of glog, which resolves a lot of issues that are not supported in containerized environments

https://github.com/openshift/machine-config-operator/pull/3734

Bug OCPBUGS-19063: Catalog pods in hypershift control plane in ImagePullBackOff

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18720~~. The following is the description of the original issue:
—
Description of problem:

Catalog pods in hypershift control plane in ImagePullBackOff

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Create a cluster in 4.14 HO + OCP 4.14.0-0.ci-2023-09-07-120503
2. Check controlplane pods, catalog pods in control plane namespace in ImagePullBackOff
3.

Actual results:

jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep catalog catalog-operator-64fd787d9c-98wx5                     2/2     Running            0          2m43s 
certified-operators-catalog-7766fc5b8-4s66z           0/1     ImagePullBackOff   0          2m43s 
community-operators-catalog-847cdbff6-wsf74           0/1     ImagePullBackOff   0          2m43s 
redhat-marketplace-catalog-fccc6bbb5-2d5x4            0/1     ImagePullBackOff   0          2m43s 
redhat-operators-catalog-86b6f66d5d-mpdsc             0/1     ImagePullBackOff   0          2m43s

Events:   Type     Reason          Age                 From               Message   ----     ------          ----                ----               -------   Normal   Scheduled       65m                 default-scheduler  Successfully assigned clusters-jie-test/certified-operators-catalog-7766fc5b8-4s66z to ip-10-0-64-135.us-east-2.compute.internal   Normal   AddedInterface  65m                 multus             Add eth0 [10.128.2.141/23] from openshift-sdn   Normal   Pulling         63m (x4 over 65m)   kubelet            Pulling image "from:imagestream"   Warning  Failed          63m (x4 over 65m)   kubelet            Failed to pull image "from:imagestream": rpc error: code = Unknown desc = reading manifest imagestream in docker.io/library/from: requested access to the resource is denied   Warning  Failed          63m (x4 over 65m)   kubelet            Error: ErrImagePull   Warning  Failed          63m (x6 over 65m)   kubelet            Error: ImagePullBackOff   Normal   BackOff         9s (x280 over 65m)  kubelet            Back-off pulling image "from:imagestream" jiezhao-mac:hypershift jiezhao$

Expected results:

catalog pods are running

Additional info:

slack:
https://redhat-internal.slack.com/archives/C01C8502FMM/p1694170060144859

https://github.com/openshift/hypershift/pull/3016

Bug OCPBUGS-3036: Non cluster-admin user is unable to Update an Operator in RHOCP 4 Web Console

View the Description View the linked PRs

Description of problem:

Users are not able to upgrade an namespace scoped operator in OpenShift console . 
Subscription tab is not visible in web console to the user with admin rights.
Only cluster-Admin users are able to update the operator.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Configure IDP. Add user. 
2. Install any operator in specific namespace.
3. Assign project admin permission to the user for the same namespace
4. Login with the user and check if `Subscription` tab is visible to update the operator.

Actual results:

User is not able to update the operator. Subscription tab is not visible to the user in web console.

Expected results:

User must get an access to update the namespace scoped operator if user has the admin permission for the same project.

Additional info:

Tried to reproduce the issue and observed same behavior in OCP 4.10.20 , OCP 4.10.25 and OCP 4.10.34

https://github.com/openshift/console/pull/12716

Bug OCPBUGS-10809: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/107

Bug OCPBUGS-13825: The machine-config-controller pod restart in SNO+1 cause other pods restart

View the Description View the linked PRs

Description of problem:
As a part of Chaos Monkey testing we tried to delete pod machine-config-controller in SNO+1. The pod machine-config-controller restart results in restart of daemonset/sriov-network-config-daemon and linuxptp-daemonpods pods as well.

1m47s       Normal   Killing            pod/machine-config-controller-7f46c5d49b-w4p9s    Stopping container machine-config-controller
1m47s       Normal   Killing            pod/machine-config-controller-7f46c5d49b-w4p9s    Stopping container oauth-proxy

openshift-sriov-network-operator   23m         Normal   Killing            pod/sriov-network-config-daemon-pv4tr   Stopping container sriov-infiniband-cni
openshift-sriov-network-operator   23m         Normal   SuccessfulDelete   daemonset/sriov-network-config-daemon   Deleted pod: sriov-network-config-daemon-pv4tr

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Steps to Reproduce:

Restart the machine-config-controller pod in openshift-machine-config-operator namespace. 
1. oc get pod -n openshift-machine-config-operator 
2. oc delete  pod/machine-config-controller-xxx -n openshift-machine-config-operator

Actual results:

It restarting the daemonset/sriov-network-config-daemon and linuxptp-daemonpods pods

Expected results:

It should not restart these pod

Additional info:

logs : https://drive.google.com/drive/folders/1XxYen8tzENrcIJdde8sortpyY5ZFZCPW?usp=share_link

https://github.com/openshift/machine-config-operator/pull/3838

Bug OCPBUGS-16051: MetalLB does not work when traffic comes from a secondary nic

View the Description View the linked PRs

Description of problem:

MetalLB does not work when traffic comes from a secondary nic. The root cause of this failure is net.ipv4.ip_forward flag change from 1 to 0. If we re-enable this flag everything works as expected.

Version-Release number of selected component (if applicable):

Server Version: 4.14.0-0.nightly-2023-07-05-191022

How reproducible:

Run any test case that tests metallb via secondary interface.

Steps to Reproduce:

1.
2.
3.

Actual results:

Test failed

Expected results:

Test Passed

Additional info:

Looks like this PR is the root cause: https://github.com/openshift/machine-config-operator/pull/3676/files#

https://github.com/openshift/cluster-network-operator/pull/1952

Bug OCPBUGS-8258: create cluster-manifests fails when imageContentSources is missing

View the Description View the linked PRs

Invoking 'create cluster-manifests' fails when imageContentSources is missing in install-config yaml:

$ openshift-install agent create cluster-manifests
INFO Consuming Install Config from target directory
FATAL failed to write asset (Mirror Registries Config) to disk: failed to write file: open .: is a directory

install-config.yaml:

apiVersion: v1alpha1
metadata:
  name: appliance
rendezvousIP: 192.168.122.116
hosts:
  - hostname: sno
    installerArgs: '["--save-partlabel", "agent*", "--save-partlabel", "rhcos-*"]'
    interfaces:
     - name: enp1s0
       macAddress: 52:54:00:e7:05:72
    networkConfig:
      interfaces:
        - name: enp1s0
          type: ethernet
          state: up
          mac-address: 52:54:00:e7:05:72
          ipv4:
            enabled: true
            dhcp: true

https://github.com/openshift/installer/pull/6926

Bug OCPBUGS-8523: OKD SCOS: remove workaround for rpm-ostree auth

View the Description View the linked PRs

Description of problem:

Due to rpm-ostree regression (OKD-63) MCO was copying /var/lib/kubelet/config.json into /run/ostree/auth.json on FCOS and SCOS. This breaks Assisted Installer flow, which starts with Live ISO and doesn't have /var/lib/kubelet/config.json

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3591

Bug OCPBUGS-15823: Adjust CSI rpc call timeouts from Sidecar for AWS and GCP-PD driver

View the Description View the linked PRs

We should adjust CSI RPC call timeout from sidecars to CSI driver. We seem to be using default values which are just too short and hence can cause unintended side-effects.

Bug OCPBUGS-22688: Bump FCOS image to latest stable

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22655~~. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7647

Bug OCPBUGS-18727: [OCP web console] Unable to select/change log component under master node's logs section once user made any selection.

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13146

Bug OCPBUGS-18874: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1998

Bug OCPBUGS-21879: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-powervs/pull/54

Bug OCPBUGS-12922: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/29

Bug OCPBUGS-15605: CoreDNS UDP bufsize unnecessarily restricted to 512

View the Description View the linked PRs

Description of problem:

As endorsed at DNS Flag Day, the DNS Community recommends a bufsize setting of 1232 as a safe default that supports larger payloads, while generally avoiding IP fragmentation on most networks. This is particularly relevant for payloads like those generated by DNSSEC, which tend to be larger.

Previously, CoreDNS always used the EDNS0 extension, which enables UDP-based DNS queries to exceed 512 bytes, when CoreDNS forwarded DNS queries to an upstream name server, and so OpenShift specified a bufsize setting of 512 to maintain compatibility with applications and name servers that did not support the EDNS0 extension.

For clients and name servers that do support EDNS0, a bufsize setting of 512 can result in more DNS truncation and unnecessary TCP retransmissions, resulting in worse DNS performance for most users. This is due to the fact that if a response is larger than the bufsize setting, it gets truncated, prompting clients to initiate a TCP retry. In this situation, two DNS requests are made for a single DNS answer, leading to higher bandwidth usage and longer response times.

Currently, CoreDNS no longer uses EDNS0 when forwarding requests if the original client request did not use EDNS0 (ref: coredns/coredns@a5b9749), and so the reasoning for using a bufsize setting of 512 no longer applies. By increasing the bufsize setting to the recommended value of 1232 bytes, we can enhance DNS performance by decreasing the probability of DNS truncations.

Using a larger bufsize setting of 1232 bytes also would potentially help alleviate bugs like https://issues.redhat.com/browse/OCPBUGS-6829 in which a non-compliant upstream DNS is not respecting a bufsize of 512 bytes and sending larger-than-512-bytes responses. A bufsize setting of 1232 bytes doesn't fix the root cause of this issue; rather, it decreases the likelihood of its occurrence by increasing the acceptable size range for UDP responses.

Note that clients that don’t support EDNS0 or TCP, such as applications built using older versions of Alpine Linux, are still subject to the aforementioned truncation issue. To avoid these issues, ensure that your application is built using a DNS resolver library that supports EDNS0 or TCP-based DNS queries.

Brief history of OpenShift's Bufsize changes:

During the development of OpenShift 4.8.0, we updated to 1232 bytes due to Bug - 1949361 and backported to 4.7 and 4.6. However, later on, 4.8.0 (in development), 4.7, and 4.6 were reverted back to 512 bytes due to Bug - 1966116.
Also in OpenShift 4.8.0, we bumped CoreDNS to v1.8.1, and picked up a commit that forced DNS queries that did not have the DO Bit (DNSSEC) set to set bufsize as 2048 bytes despite 512 bytes being set in the configuration.
In OpenShift 4.12.0, we fixed ~~OCPBUGS-240~~ to limit all DNS queries, specifically queries that had DO Bit off, to what is configured in the configuration file (512 bytes) and we backported the fix to 4.11, 4.10, and 4.9.
Now, this PR is changing bufsize to 1232 bytes.

Version-Release number of selected component (if applicable):

4.14, 4.13, 4.12. 4.11

How reproducible:

100%

Steps to Reproduce:

1. oc -n openshift-dns get configmaps/dns-default -o yaml | grep -i bufsize

Actual results:

Bufsize = 512

Expected results:

Bufsize = 1232

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/370

Bug OCPBUGS-16074: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource/pull/141

Task OU-198: Enable Alertmanager config web console UI tests

View the Description View the linked PRs

Background

Tests were temporarily disabled by https://issues.redhat.com//browse/OCPBUGS-14964

Outcomes

All Alertmanager config page UI tests should be running again in CI.

https://github.com/openshift/console/pull/12943

Bug OCPBUGS-25069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource/pull/163

Bug OCPBUGS-11754: oc-mirror sometimes will leave big data under /tmp

View the Description View the linked PRs

Description of problem:
sometimes the oc-mirror command will leave big data under /tmp dir and run out of disk space.

Version-Release number of selected component (if applicable):
oc mirror version
4.12/4.13

How reproducible:
Always

Steps to Reproduce:
1. Not sure the detail steps , but see logs when run oc-mirror command :

Actual results:

[root@preserve-fedora36 588]# oc-mirror --config config.yaml docker://yinzhou-133.mirror-registry.qe.gcp.devcluster.openshift.com:5000 --dest-skip-tls
Checking push permissions for yinzhou-133.mirror-registry.qe.gcp.devcluster.openshift.com:5000
Creating directory: oc-mirror-workspace/src/publish
Creating directory: oc-mirror-workspace/src/v2
Creating directory: oc-mirror-workspace/src/charts
Creating directory: oc-mirror-workspace/src/release-signatures
No metadata detected, creating new workspace

The rendered catalog is invalid.

Run "oc-mirror list operators --catalog CATALOG-NAME --package PACKAGE-NAME" for more information.

error: error rendering new refs: render reference "registry.redhat.io/redhat/redhat-operator-index:v4.11": write /tmp/render-unpack-2866670795/tmp/cache/cache/red-hat-camel-k_latest_red-hat-camel-k-operator.v1.6.0.json: no space left on device
[root@preserve-fedora36 588]# cd /tmp/
[root@preserve-fedora36 tmp]# ls
imageset-catalog-registry-333402727  render-unpack-2230547823

Expected results:
Always delete the created datas under /tmp at any stations.

Additional info:

https://github.com/openshift/oc-mirror/pull/655

Bug OCPBUGS-12305: Update 4.14 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/97

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/97

Bug OCPBUGS-21454: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-networkpolicy/pull/33

Bug OCPBUGS-23751: update packages in ironic-agent

View the Description View the linked PRs

update packages versions in ironic-agent container to bring in latest fixes

https://github.com/openshift/ironic-agent-image/pull/96

Bug OCPBUGS-22374: Telemetry: Current page was sometimes not tracked when reloading the current page

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-9422~~. The following is the description of the original issue:
—
Description of problem:
We want to understand our users, but the first page the user opens wasn't tracked.

Version-Release number of selected component (if applicable):
Saw this on Dev Sandbox with 4.10 and 4.11 with enabled telemetry

How reproducible:
Sometimes! Looks like a race condition and requires active telemetry

Steps to Reproduce:
1. Open the browser network inspector and filter for segment
2. Open the developer console

Actual results:
1-2 identity event is send, but no page event

Expected results:
At least one identity event and at least one page event should be send to segment

Additional info:

https://github.com/openshift/console/pull/13277

Bug OCPBUGS-11493: vsphereStorageDriver validation is misleading

View the Description View the linked PRs

Description of problem:
{{}}
vsphereStorageDriver validation error message here is odd when I change LegacyDeprecatedInTreeDriver to "" . I get:

Invalid value: "string": VSphereStorageDriver can not be changed once it is set to CSIWithMigrationDriver

There is no CSIWithMigrationDriver either in the old or new Storage CR.

Version-Release number of selected component (if applicable):

4.13 with this PR: https://github.com/openshift/api/pull/1433

https://github.com/openshift/cluster-storage-operator/pull/357

Bug OCPBUGS-12447: Origin should generate intervals for ovs-vswitchd Unreasonably long poll interval

View the Description View the linked PRs

We are pushing to find a resolution for ~~OCPBUGS-11591~~ and the SDN team has identified a key message that appears related in the system journald logs:

Apr 12 11:53:51.395838 ci-op-xs3rnrtc-2d4c7-4mhm7-worker-b-dwc7w ovs-vswitchd[1124]: ovs|00002|timeval(urcu4)|WARN|Unreasonably long 109127ms poll interval (0ms user, 0ms system)

We should detect this in origin and create an interval so it can be charted in the timelines, as well as a unit test that fails if detected so we can see where it's happening.

https://github.com/openshift/origin/pull/27889

Bug OCPBUGS-19922: [4.14] Skip agent-tui on OCI

View the Description View the linked PRs

The agent-tui interface for editing the network config for the Agent ISO at boot time only runs on the graphical console (tty1). It's difficult to run two copies, so this gives the most value for now.

Although tty1 always exists, OCI only has a serial console available (assuming it is enabled - see ~~OCPBUGS-19092~~), so the user doesn't see anything on the console while agent-tui is running (and in fact the systemd progress output is suspended for the duration).

Network configuration of any kind is rarely needed in the cloud, anyway. So on OCI specifically we mostly are slowing boot down by 20s for no real reason. We should disable agent-tui in this case - either by disabling the service or simply not adding the binary to the ISO image.

Bug OCPBUGS-20345: [4.14] Enable console on OCI

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19092~~. The following is the description of the original issue:
—
When creating an Agent ISO for OCI, we should add the kernel argument console=ttyS0 to the ISO/PXE kargs.

CoreOS does not include a console arg by default when using metal as the platform because different hardware has different consoles and specifying one can cause booting to fail on some, but it does on many cloud platforms. Since we know when the user is definitely using OCI (there are validations in assisted that ensure it) and we know the correct settings for OCI, we should set them up automatically.

https://github.com/openshift/installer/pull/7569

Bug OCPBUGS-20472: hosted cluster upgrade failure from 4.13 stable to 4.14 nightly

View the Description View the linked PRs

Description of problem:

prow CI job: periodic-ci-openshift-openshift-tests-private-release-4.14-amd64-nightly-4.14-upgrade-from-stable-4.13-aws-ipi-ovn-hypershift-replace-f7 failed in the step of upgrading the HCP image of the hosted cluster.

one failed job link: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/pr-logs/pull/opens[…]-hypershift-replace-f7/1712338041915314176

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

* retrigger/rehearsal the job
or 
* create a 4.13 stable hosted cluster and upgrade it to 4.14 nightly manually

Actual results:

the upgrade failed using 4.14 nightly image for `hostedcluster`

Expected results:

upgrade for hostedcluster/nodepool successfully

Additional info:

we could get dump file from the job artifacts

https://github.com/openshift/cluster-network-operator/pull/2063

Bug OCPBUGS-8005: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12998

Bug OCPBUGS-10433: multus-admission-controller does not have correct RollingUpdate parameterts when running under Hypershift

View the Description View the linked PRs

Description of problem:

When CNO is managed by Hypershift multus-admission-controller does not have correct RollingUpdate parameterts meeting Hypershift requirements outligned here: https://github.com/openshift/hypershift/blob/646bcef53e4ecb9ec01a05408bb2da8ffd832a14/support/config/deployment.go#L81
```
There are two standard cases currently with hypershift: HA mode where there are 3 replicas spread across zones and then non ha with one replica. When only 3 zones are available you need to be able to set maxUnavailable in order to progress the rollout. However, you do not want to set that in the single replica case because it will result in downtime.
```
So when multus-admission-controller has more than one replica the RollingUpdate parameters should be
```
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
```

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Check rolling update parameters of multus-admission-controller

Actual results:

the operator has default parameters: {"rollingUpdate":{"maxSurge":"25%","maxUnavailable":"25%"},"type":"RollingUpdate"}

Expected results:

{"rollingUpdate":{"maxSurge":0,"maxUnavailable":1},"type":"RollingUpdate"}

Additional info:

https://github.com/openshift/cluster-network-operator/pull/1740

Bug OCPBUGS-14667: "invalid 'runbook_url' annotation" test permafailing for prometheus

View the Description View the linked PRs

after the 'runbook_url' annotation test was increased in severity in https://github.com/openshift/origin/pull/27933 it started permafailing
example logs
https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ironic-image/379/pull-ci-openshift-ironic-image-master-prevalidation-e2e-metal-ipi-virtualmedia-prevalidation/1666311316056313856

https://github.com/openshift/origin/pull/27969

Bug OCPBUGS-7581: Deleting unmanaged BMH get stuck

View the Description View the linked PRs

Description of problem:

When trying to delete a BMH object, which is unmanaged, the Metal3 cannot delete. The BMH object is unmanaged because it does not provide information about BMC (neither address, nor credentials).

In this case the Metal 3 tries to delete but fails and never finalizes. The BMH deletion gets stuc.
This is the log from MEtal3

{"level":"info","ts":1676531586.4898946,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.4980938,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.5050912,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676531586.5105371,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676531586.51569,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org"}                                                                                            
{"level":"info","ts":1676531586.5191178,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676531586.525755,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                 
{"level":"info","ts":1676531586.5356712,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}                                
{"level":"info","ts":1676532186.5117555,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.5195107,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.526355,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org"}                                                                                           
{"level":"info","ts":1676532186.5317476,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/worker-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.5361836,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532186.5404322,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.5482726,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-2.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532186.555394,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"openshift-machine-api/master-0.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged","requeue":true,"after":600}
{"level":"info","ts":1676532532.3448665,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org"}                                                                                          
{"level":"info","ts":1676532532.344922,"logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org"}
{"level":"info","ts":1676532532.3656478,"logger":"controllers.BareMetalHost","msg":"Initiating host deletion","baremetalhost":"openshift-machine-api/worker-1.el8k-ztp-1.hpecloud.org","provisioningState":"unmanaged"}
{"level":"error","ts":1676532532.3656952,"msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","bareMetalHost":{"name":"worker-1.el8k-ztp-1.hpecloud.org","namespace":"openshift-machine-api"},
"namespace":"openshift-machine-api","name":"worker-1.el8k-ztp-1.hpecloud.org","reconcileID":"525a5b7d-077d-4d1e-a618-33d6041feb33","error":"action \"unmanaged\" failed: failed to determine current provisioner capacity: failed to parse BMC address informa
tion: missing BMC address","errorVerbose":"missing BMC address\ngithub.com/metal3-io/baremetal-operator/pkg/hardwareutils/bmc.NewAccessDetails\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/github.com/metal3-io/baremetal-operator/pkg/hardwareu
tils/bmc/access.go:145\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).bmcAccess\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:112\ngithub.com/metal3-io/baremetal-operator/pkg/pro
visioner/ironic.(*ironicProvisioner).HasCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1922\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ensureCapacity\n\t/go/src/githu
b.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:83\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/meta
l3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:175\ngithub.com/metal
3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareM
etalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremet
al-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/contr
oller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/contro
ller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\
n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\nfailed to parse BMC address information\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).bmcAccess\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/iro
nic/ironic.go:114\ngithub.com/metal3-io/baremetal-operator/pkg/provisioner/ironic.(*ironicProvisioner).HasCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/pkg/provisioner/ironic/ironic.go:1922\ngithub.com/metal3-io/baremetal-operator/controlle
rs/metal3%2eio.(*hostStateMachine).ensureCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:83\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n
\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator
/controllers/metal3.io/host_state_machine.go:175\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithu
b.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controll
er.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/sr
c/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-
operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-
runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\nfailed to determine current provisioner capacity\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ensur
eCapacity\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:85\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).updateHostStateFrom\n\t/go/src/github.com/metal3-io/baremetal
-operator/controllers/metal3.io/host_state_machine.go:106\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState.func1\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machin
e.go:175\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:186\ngithub.com/metal3-io/baremetal-operator/contr
ollers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:226\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/gi
thub.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operato
r/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-r
untime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controll
er.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594\naction \"unmanaged\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operato
r/controllers/metal3.io/baremetalhost_controller.go:230\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/contr
oller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller
-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.
(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594","stacktrace":"sigs.k8s.io/cont
roller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/contr
oller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Provide a BMH object with no BMC credentials. The BMH is set unmanaged.

Steps to Reproduce:

1. delete the object
2. gets stuck
3.

Actual results:

get stuck deletiong

Expected results:

Metal3 detects the BMH is unmanaged, and dont try to do deprovisioning.

Additional info:

https://github.com/openshift/baremetal-operator/pull/280

Bug OCPBUGS-14262: Pipeline metrics page breaks

View the Description View the linked PRs

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

create and start a pipeline and navigate to the Pipeline metrics page

Actual results:

Pipeline metrics page crash

Expected results:

Pipeline metrics page should works

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

4.14.0-0.nightly-2023-05-29-174116

Workaround:

Additional info:

It is regression after this got merged https://github.com/openshift/console/pull/12821/commits/c2d24932cd41b1b4c89d7b9fa5ca46d18b0d2d29#diff-782cbf3ae7050932e76be67d990d9cdaa02e322ea6c2b53083a677ed311ff612R40

https://github.com/openshift/console/pull/12863

Bug OCPBUGS-16615: Prometheus reporting telemetry test intermittent failures due to server side rate limiting

View the Description View the linked PRs

Description of problem:

The TRT ComponentReadiness tool shows what looks like a regression (https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?arch=amd64&baseEndTime=2023-05-16%2023%3A59%3A59&baseRelease=4.13&baseStartTime=2023-04-16%2000%3A00%3A00&capability=Other&component=Monitoring&confidence=95&environment=ovn%20no-upgrade%20amd64%20aws%20hypershift&excludeArches=heterogeneous%2Carm64%2Cppc64le%2Cs390x&groupBy=cloud%2Carch%2Cnetwork&ignoreDisruption=true&ignoreMissing=false&minFail=3&network=ovn&pity=5&platform=aws&sampleEndTime=2023-07-20%2023%3A59%3A59&sampleRelease=4.14&sampleStartTime=2023-07-13%2000%3A00%3A00&testId=openshift-tests%3A79898d2e28b78374d89e10b38f88107b&testName=%5Bsig-instrumentation%5D%20Prometheus%20%5Bapigroup%3Aimage.openshift.io%5D%20when%20installed%20on%20the%20cluster%20should%20report%20telemetry%20%5BLate%5D%20%5BSkipped%3ADisconnected%5D%20%5BSuite%3Aopenshift%2Fconformance%2Fparallel%5D&upgrade=no-upgrade&variant=hypershift)

in the "[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster should report telemetry [Late] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]" test.

In the ComponentReadiness link above, you can see the sample runs (linked with red "F").

Version-Release number of selected component (if applicable):

4.14

How reproducible:

The pass rate in 4.13 is 100% vs. 81% in 4.14

Steps to Reproduce:

1.  There query above focuses on "periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn-conformance" jobs and the specific test mentioned.  You can see the failures by clicking on the red "F"s
2.
3.

Actual results:

The failures look like:

{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:365]: Unexpected error:
    <errors.aggregate | len:2, cap:2>: 
    [promQL query returned unexpected results:
    metricsclient_request_send{client="federate_to",job="telemeter-client",status_code="200"} >= 1
    [], promQL query returned unexpected results:
    federate_samples{job="telemeter-client"} >= 10
    []]
    [
        <*errors.errorString | 0xc0017611b0>{
            s: "promQL query returned unexpected results:\nmetricsclient_request_send{client=\"federate_to\",job=\"telemeter-client\",status_code=\"200\"} >= 1\n[]",
        },
        <*errors.errorString | 0xc00203d380>{
            s: "promQL query returned unexpected results:\nfederate_samples{job=\"telemeter-client\"} >= 10\n[]",
        },
    ]

Expected results:

Query should succeed

Additional info:

I set the severity to Major because this looks like a regression from where it was in the 5 weeks before 4.13 went GA.

Bug OCPBUGS-2633: Got the `file exists` error when different digest direct to the same tag

View the Description View the linked PRs

Description of problem:

There are different versions, channel for the operator, but may be they use the same 'latest' label, when mirroring them as `additionalImages`, got the below error:

[root@ip-172-31-249-209 jian]# oc-mirror --config mirror.yaml file:///root/jian/test/
...
...
sha256:672b4bee759f8115e5538a44c37c415b362fc24b02b0117fd4bdcc129c53e0a1 file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest
sha256:d90aecc425e1b2e0732d0a90bc84eb49eb1139e4d4fd8385070d00081c80b71c file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
info: Mirroring completed in 22.48s (125.8MB/s)
error: one or more errors occurred while uploading images

Version-Release number of selected component (if applicable):

[root@ip-172-31-249-209 jian]# oc-mirror version
Client Version: version.Info{Major:"0", Minor:"1", GitVersion:"v0.1.0", GitCommit:"6ead1890b7a21b6586b9d8253b6daf963717d6c3", GitTreeState:"clean", BuildDate:"2022-08-25T05:27:39Z", GoVersion:"go1.17.12", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:

always

Steps to Reproduce:

1. use the below config:
[cloud-user@preserve-olm-env2 mirror-tmp]$ cat mirror.yaml
apiVersion: mirror.openshift.io/v1alpha1
kind: ImageSetConfiguration
# archiveSize: 4
mirror:
  additionalImages:
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:46a62d73aeebfb72ccc1743fc296b74bf2d1f80ec9ff9771e655b8aa9874c933
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:9e549c09edc1793bef26f2513e72e589ce8f63a73e1f60051e8a0ae3d278f394
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:c16891ee9afeb3fcc61af8b2802e56605fff86a505e62c64717c43ed116fd65e
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:5c37bd168645f3d162cb530c08f4c9610919d4dada2f22108a24ecdea4911d60
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:89a6abbf10908e9805d8946ad78b98a13a865cefd185d622df02a8f31900c4c1
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:de5b339478e8e1fc3bfd6d0b6784d91f0d3fbe0a133354be9e9d65f3d7906c2d
    - name: brew.registry.redhat.io/rh-osbs/openshift-ose-cluster-kube-descheduler-operator-bundle@sha256:fdf774c4365bde48d575913d63ef3db00c9b4dda5c89204029b0840e6dc410b1
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:d90aecc425e1b2e0732d0a90bc84eb49eb1139e4d4fd8385070d00081c80b71c
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:15cc75164335fa178c80db4212d11e4a793f53d2b110c03514ce4c79a3717ca0
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:9e66db3a282ee442e71246787eb24c218286eeade7bce4d1149b72288d3878ad
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:546b14c1f3fb02b1a41ca9675ac57033f2b01988b8c65ef3605bcc7d2645be60
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:12d7061012fd823b57d7af866a06bb0b1e6c69ec8d45c934e238aebe3d4b68a5
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:41025e3e3b72f94a3290532bdd6cabace7323c3086a9ce434774162b4b1dd601
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:672b4bee759f8115e5538a44c37c415b362fc24b02b0117fd4bdcc129c53e0a1
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:92542b22911fbd141fadc53c9737ddc5e630726b9b53c477f4dfe71b9767961f
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:1feb7073dec9341cadcc892df39ae45c427647fb034cf09dce1b7aa120bbb459
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:7ca05f93351959c0be07ec3af84ffe6bb5e1acea524df210b83dd0945372d432
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:c0fe8830f8fdcbe8e6d69b90f106d11086c67248fa484a013d410266327a4aed
    - name: brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator@sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06
    - name: brew.registry.redhat.io/openshift4/ose-descheduler@sha256:b386d0e1c9e12e9a3a07aa101257c6735075b8345a2530d60cf96ff970d3d21a


2. Run the 
$ oc-mirror --config mirror.yaml file:///root/jian/test/

Actual results:

error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:f6b6a15c4477615ff202e73d77fc339977aeeca714b9667196509d53e2d2e4f5 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists
error: unable to push manifest to file://brew.registry.redhat.io/openshift4/ose-cluster-kube-descheduler-operator:latest: symlink sha256:6a1de43c60d021921973e81c702e163a49300254dc3b612fd62ed2753efe4f06 /root/jian/test/oc-mirror-workspace/src/v2/openshift4/ose-cluster-kube-descheduler-operator/manifests/latest.download: file exists

Expected results:

No error

Additional info:

https://github.com/openshift/oc-mirror/pull/601

Bug OCPBUGS-8530: Nutanix cloud-controller-manager pod not have permission to get/list ConfigMap

View the Description View the linked PRs

Description of problem:

The e2e-nutanix test run failed at bootstrap stage when testing the PR https://github.com/openshift/cloud-provider-nutanix/pull/7. Could reproduce the bootstrap failure with the manual testing to create a Nutanix OCP cluster with the latest nutanix-ccm image.

time="2023-03-06T12:25:56-05:00" level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
time="2023-03-06T12:25:56-05:00" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."
time="2023-03-06T12:25:56-05:00" level=warning msg="The bootstrap machine is unable to resolve API and/or API-Int Server URLs"

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

From the PR https://github.com/openshift/cloud-provider-nutanix/pull/7, trigger the e2e-nutanix test. The test will fail at bootstrap stage with the described errors.

Actual results:

The e2e-nutanix test run failed at bootstrapping with the errors: 

level=error msg=Bootstrap failed to complete: timed out waiting for the condition
level=error msg=Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane.

Expected results:

The e2e-nutanix test will pass

Additional info:

Investigation showed the root cause was the Nutanix cloud-controller-manager pod did not have permission to get/list ConfigMap resource. The error logs from the Nutanix cloud-controller-manager pod:

E0307 16:08:31.753165       1 reflector.go:140] pkg/provider/client.go:124: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope
I0307 16:09:30.050507       1 reflector.go:257] Listing and watching *v1.ConfigMap from pkg/provider/client.go:124
W0307 16:09:30.052278       1 reflector.go:424] pkg/provider/client.go:124: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope
E0307 16:09:30.052308       1 reflector.go:140] pkg/provider/client.go:124: Failed to watch *v1.ConfigMap: failed to list *v1.ConfigMap: configmaps is forbidden: User "system:serviceaccount:openshift-cloud-controller-manager:cloud-controller-manager" cannot list resource "configmaps" in API group "" at the cluster scope

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/236

Bug OCPBUGS-10152: Update 4.14 ose-vmware-vsphere-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/vmware-vsphere-csi-driver/pull/62

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/vmware-vsphere-csi-driver/pull/63

Bug OCPBUGS-14356: Cluster Autoscaler Operator should inject unique labels on Nutanix platform

View the Description View the linked PRs

Description of problem:

There are several labels used by the Nutanix platform which can vary between instances. If not set as ignore labels on the Cluster Autoscaler, features such as balancing similar node groups will not work predictably.

The Cluster Autoscaler Operator should be updated with the following labels on Nutanix:

* nutanix.com/prism-element-name
* nutanix.com/prism-element-uuid
* nutanix.com/prism-host-name
* nutanix.com/prism-host-uuid

for reference see this code: https://github.com/openshift/cluster-autoscaler-operator/blob/release-4.14/pkg/controller/clusterautoscaler/clusterautoscaler.go#L72-L159

Version-Release number of selected component (if applicable):

master, 4.14

How reproducible:

always

Steps to Reproduce:

1. create a ClusterAutoscaler CR on Nutanix platform
2. inspect the deployment for the cluster-autoscaler
3. see that it does not have the ignore labels added as command line flags

Actual results:

labels are not added as flags

Expected results:

labels should be added as flags

Additional info:

this should proabably be backported to 4.13 as well since the labels will be applied by the Nutanix CCM

https://github.com/openshift/cluster-autoscaler-operator/pull/275

Bug OCPBUGS-21869: 4.14 CI blocked by Hypershift e2e test: TestNodePool/ValidateHostedCluster/EnsurePSANotPrivileged

View the Description View the linked PRs

Description of problem:

Since 10/17, the Hypershift e2e job has been blocking the 4.14 CI payload. Here is a link for the job failures: 

https://sippy.dptools.openshift.org/sippy-ng/jobs/4.14/analysis?filters=%7B%22items%22:%5B%7B%22columnField%22:%22name%22,%22operatorValue%22:%22equals%22,%22value%22:%22periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn%22%7D%5D%7D

Main tests that failed:
: TestNodePool/ValidateHostedCluster/EnsurePSANotPrivileged expand_more	0s
: TestNodePool/ValidateHostedCluster expand_more	5m8s
: TestNodePool expand_more

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3107

Bug OCPBUGS-13789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/489

Bug OCPBUGS-14340: oc should not append the -x86_64 suffix when mirroring multi-arch payloads

View the Description View the linked PRs

Description of problem:

oc should not append the -x86_64 suffix when mirroring multi-arch payloads

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.oc adm release mirror quay.io/openshift-release-dev/ocp-release:4.12.13-multi --keep-manifest-list=true --to=someregistry.io/somewhere/release  
2.
3.

Actual results:

05-31 04:54:15.807        sha256:cd8639e34840833dd98d8323f1999b00ca06c73d7ae9ad8945f7b397450821ee -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-insights-operator
05-31 04:54:15.807        sha256:d0443f26968a2159e8b9590b33c428b6af7c0220ab6cc13633254d8843818cdf -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-keepalived-ipfailover
05-31 04:54:15.807        sha256:d2126187264d04f812068c03b59316547f043f97e90ec1a605ac24ab008c85a0 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-agent-installer-orchestrator
05-31 04:54:15.807        sha256:d445a4ece53f0695f1b812920e4bbb8a73ceef582918a0f376c2c5950a3e050b -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-ovn-kubernetes
05-31 04:54:15.807        sha256:d4bfe3bac81d5bb758efced8706a400a4b1dad7feb2c9a9933257fde9f405866 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-csi-snapshot-controller
05-31 04:54:15.807        sha256:d50c009e4b47bb6d93125c08c19c13bf7fd09ada197b5e0232549af558b25d19 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-vsphere-csi-driver-operator
05-31 04:54:15.807        sha256:d844ecbbba99e64988f4d57de9d958172264e88b9c3bfc7b43e5ee19a1a2914e -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-ironic
05-31 04:54:15.807        sha256:d90b37357d4c2c0182787f6842f89f56aaebeab38a139c62f4a727126e036578 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-baremetal-machine-controllers
05-31 04:54:15.807        sha256:d928536d8d9c4d4d078734004cc9713946da288b917f1953a8e7b1f2a8428a64 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-azure-cloud-controller-manager
05-31 04:54:15.807        sha256:da049d5a453eeb7b453e870a0c52f70df046f2df149bca624248480ef83f2ac8 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-cli-artifacts
05-31 04:54:15.807        sha256:db1cf013e3f845be74553eecc9245cc80106b8c70496bbbc0d63b497dcbb6556 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-cluster-capi-controllers
05-31 04:54:15.807        sha256:dc7b1305c7fec48d29adc4d8b3318d3b1d1d12495fb2d0ddd49a33e3b6aed0cc -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-gcp-pd-csi-driver
05-31 04:54:15.807        sha256:de8753eb8b2ccec3474016cd5888d03eeeca7e0f23a171d85b4f9d76d91685a3 -> 4.14.0-0.nightly-multi-2023-05-30-024840-x86_64-baremetal-installer

Expected results:

no -x86_64 suffix added to the images tags

Additional info:

https://github.com/openshift/oc/pull/1423

Bug MGMT-13977: [Staging] [BE] - Creating a cluster with 1 char string long base domain (i.e. - "c" as base domain) - getting DNS wildcard not configured: error

View the Description View the linked PRs

Description of the problem:

BE 2.16, base domain allows 1 char string long. This results with cluster address like: clustername.r, but in networking page I get DNS wildcard not configured

How reproducible:

100%

Steps to reproduce:

1. Create a cluster with 1 character string as base domain (i.e. "c" )

2. move to Networking page

3. set all needed info (api + ingress vips) . Validation error - DNS wildcard not configured: is shown

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5196

Bug MGMT-14165: Cluster not ready because Machine Network CIDR is undefined

View the Description View the linked PRs

Description of the problem:

In the Create cluster wizard -> Networking page, an error is shown saying that the cluster is not ready yet. The warning message suggests to define the API or Ingress IP but they are already input in the form and in the YAML (see screenshots attached)

Also, the hosts are oscillating between "Pending input" and "Insufficient" states, with the errors shown in the images

Found this error while testing epic MGMT-9907

MCE image 2.3.0-DOWNANDBACK-2023-03-28-23-01-58

https://github.com/openshift/assisted-service/pull/5275

Bug OCPBUGS-13131: Update 4.14 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/112

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/112

Bug OCPBUGS-13187: Make vsphere-problem-detector alerts configurable

View the Description View the linked PRs

Description of problem:

The vsphere-problem-detector feature is triggering VSphereOpenshiftClusterHealthFail alerts regarding “CheckFolderPermissions” and “CheckDefaultDatastore” after upgrading from 4.9.54. Forcing users to update configuration solely to get around the problem detector. Depending on the customer policies around vCenter passwords or configuration updates, this can be a major obstacle for a user who wants to keep the current vSphere settings since they've worked correctly in the previous Openshift versions.

Version-Release number of selected component (if applicable):

4.10.55

How reproducible:

Consistently

Steps to Reproduce:

1.Upgrading a cluster to 4.10 with invalid vSphere credentials

Actual results:

The cluster-storage-operator fires alarms regarding vSphere configuration in Openshift.

Expected results:

Bypass the vsphere-problem-detector if the user doesn't want to make a config change, since the setup is working, and upgrades like this succeeded for user previous to 4.10.

Additional info:

Bug OCPBUGS-25671: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/891

Story CORS-2656: Remove Context from GCP Destroy Struct

View the Description View the linked PRs

When making a change to the uninstaller for GCP, the linter picked up an error:

pkg/destroy/gcp/gcp.go:42:2: found a struct that contains a context.Context field (containedctx)
	Context           context.Context

Contexts should not be added to structs. Instead the context should be created at the top level of the uninstaller OR a separate context can be used for each stage of the uninstallation process.

Currently this error can be bypassed by adding:

//nolint:containedctx

to the offending line

https://github.com/openshift/installer/pull/7169

Bug OCPBUGS-27418: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/2239

Bug OCPBUGS-12347: Update 4.14 kube-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/94

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/94

Bug OCPBUGS-13892: there are spaces before and after pod_network_name_info metric

View the Description View the linked PRs

Description of problem:

administrator console UI, admin user goes to "Workloads -> Pods", select one project, example: openshift-console, select one pod and go to Pod details page, click "Metrics" tab, then click on "Network in" or "Network out" graph, it will show the prometheus expression, would find there are spaces before and after "pod_network_name_info", it's "( pod_network_name_info )", "pod_network_name_info" is enough

"Network in" expression

(sum(irate(container_network_receive_bytes_total{pod='console-5f4978747c-vmxqf', namespace='openshift-console'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) ( pod_network_name_info )

"Network out" expression

(sum(irate(container_network_transmit_bytes_total{pod='console-5f4978747c-vmxqf', namespace='openshift-console'}[5m])) by (pod, namespace, interface)) + on(namespace,pod,interface) group_left(network_name) ( pod_network_name_info )

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-19-234822

How reproducible:

always

Steps to Reproduce:

1. see the description
2.
3.

Actual results:

there are spaces before and after pod_network_name_info

Expected results:

no additional spaces

Additional info:

the bug does not have functional impact

https://github.com/openshift/console/pull/13043

Bug OCPBUGS-16128: 4.13/4.14 MCDs do not work with FIPS enabled golang builders

View the Description View the linked PRs

Updated Description:

The MCD, during a node lifespan, can go through multiple iterations of RHEL8 and RHEL9. This was not a problem until we turned on fips enabled golang with dynamic linking. This requires the MCD binary running (either in container or on host) to always match the host built version. As an additional complication, we have an early boot process (machine-config-daemon-pull/firstboot.service) that can be different from the rest of the cluster node versions (bootimage version is not updated) as well as the fact that we chroot (dynamically go from rhel8 to rhel9) in the container, so we need a better process to ensure the right binary is always used.

Current testing of this flow in https://github.com/openshift/machine-config-operator/pull/3799

Description of problem:

MCO CI started failing this week, and 4.14 nightlies have also made it into 4.14 nightlies. See also: https://issues.redhat.com/browse/TRT-1143. The failure manifests as a warning in the MCO. Looking at a MCD log, you will see a failure like:

W0712 08:52:15.475268    7971 daemon.go:1089] Got an error from auxiliary tools: kubelet health check has failed 3 times: Get "http://localhost:10248/healthz": dial tcp: lookup localhost: device or resource busy

The root cause so far seems to be that 4.14 switched from a regular 1.20.3 golang to 1.20.5 with FIPS and dynamic linking in the builder, causing the failures to begin. Most functionality is not broken, but the daemon subroutine that does the kubelet health check appears to be unable to reach the localhost endpoint

One possibility is that the rhel8-daemon chroot'ing into the rhel9-host and running these commands is causing the issue. Regardless, there are a bunch of issues with rhel8/rhel9 duality in the MCD that we would need to address in 4.13/4.14

Also tangentially related: https://issues.redhat.com/browse/MCO-663

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3799

Bug OCPBUGS-10167: Update 4.14 ose-gcp-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-gcp/pull/29

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-gcp/pull/29

Bug OCPBUGS-16678: CI fails on "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install" for cinder CSI pods

View the Description View the linked PRs

Description of problem: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.14-e2e-openstack-sdn/1682353286402805760 failed with:

fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 2 pods failed before test on SCC errors
Error creating: pods "openstack-cinder-csi-driver-controller-7c4878484d-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[0].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[1].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[2].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[3].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[3].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[3].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[4].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[4].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[4].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[5].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[5].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[5].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[6].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[6].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[6].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[7].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[7].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[7].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[8].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[8].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[8].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider restricted-v2: .containers[9].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[9].containers[0].hostPort: Invalid value: 10301: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[2].hostPort: Invalid value: 9202: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[4].hostPort: Invalid value: 9203: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[6].hostPort: Invalid value: 9204: Host ports are not allowed to be used, provider restricted-v2: .containers[9].containers[8].hostPort: Invalid value: 9205: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for ReplicaSet.apps/v1/openstack-cinder-csi-driver-controller-7c4878484d -n openshift-cluster-csi-drivers happened 13 times
Error creating: pods "openstack-cinder-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[8]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, provider restricted-v2: .containers[0].privileged: Invalid value: true: Privileged containers are not allowed, provider restricted-v2: .containers[0].capabilities.add: Invalid value: "SYS_ADMIN": capability may not be added, provider restricted-v2: .containers[0].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[0].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[0].allowPrivilegeEscalation: Invalid value: true: Allowing privilege escalation for containers is not allowed, provider restricted-v2: .containers[1].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[1].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider restricted-v2: .containers[2].hostNetwork: Invalid value: true: Host network is not allowed to be used, provider restricted-v2: .containers[2].containers[0].hostPort: Invalid value: 10300: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/openstack-cinder-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/120

Bug OCPBUGS-22206: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/281

Bug OCPBUGS-10089: Update 4.14 kube-rbac-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-rbac-proxy/pull/64

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-rbac-proxy/pull/64

Bug OCPBUGS-14620: KCM is not aware of the AWS Region ap-southeast-3

View the Description View the linked PRs

Description of problem:

When installing a HyperShift cluster into ap-southeast-3 (currently only availble in the production environment), the install will never succeed due to the hosted KCM pods stuck in CrashLoopBackoff

Version-Release number of selected component (if applicable):

4.12.18

How reproducible:

100%

Steps to Reproduce:

1. Install a HyperShift Cluster in ap-southeast-3 on AWS

Actual results:

kube-controller-manager-54fc4fff7d-2t55x                 1/2     CrashLoopBackOff   7 (2m49s ago)   16m
kube-controller-manager-54fc4fff7d-dxldc                 1/2     CrashLoopBackOff   7 (93s ago)     16m
kube-controller-manager-54fc4fff7d-ww4kv                 1/2     CrashLoopBackOff   7 (21s ago)     15m

With selected "important" logs:
I0606 15:16:25.711483       1 event.go:294] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="ConfigMap" apiVersion="v1" type="Normal" reason="LeaderElection" message="kube-controller-manager-54fc4fff7d-ww4kv_6dbab916-b4bf-447f-bbb2-5037864e7f78 became leader"
I0606 15:16:25.711498       1 event.go:294] "Event occurred" object="kube-system/kube-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="kube-controller-manager-54fc4fff7d-ww4kv_6dbab916-b4bf-447f-bbb2-5037864e7f78 became leader"
W0606 15:16:25.741417       1 plugins.go:132] WARNING: aws built-in cloud provider is now deprecated. The AWS provider is deprecated and will be removed in a future release. Please use https://github.com/kubernetes/cloud-provider-aws
I0606 15:16:25.741763       1 aws.go:1279] Building AWS cloudprovider
F0606 15:16:25.742096       1 controllermanager.go:245] error building controller context: cloud provider could not be initialized: could not init cloud provider "aws": not a valid AWS zone (unknown region): ap-southeast-3a

Expected results:

The KCM pods are Running

https://github.com/openshift/hypershift/pull/2659

Bug OCPBUGS-15583: MachineConfig rollout after Control-Plane Node(s) CPU and Memory update because of nodeStatusUpdateFrequency being updated

View the Description View the linked PRs

Description of problem:

After adding additional CPU and Memory to the OpenShift Container Platform 4 - Control-Plane Node(s) it was noticed that a new MachineConfig was rolled out, causing all OpenShift Container Platform 4 - Node(s) to reboot unexpected.

Interesting enough, no new MachineConfig was rendered but actually a slightly older MachineConfig was picked and applied to all OpenShift Container Platform 4 - Node after the change on the OpenShift Container Platform 4 - Control-Plane Node(s) was performed.

The only visible change found in the MachineConfig was that nodeStatusUpdateFrequency was updated from 10s to 0s even though nodeStatusUpdateFrequency is not specified or configured in any MachineConfig or KubeletConfig.

https://issues.redhat.com/browse/OCPBUGS-6723 was found but given that the affected OpenShift Container Platform 4 - Cluster is running 4.11.35 it's difficult to understand what happen as generally this problem was/is suspected to be solved.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.11.35

How reproducible:

Unknown

Steps to Reproduce:

1. OpenShift Container Platform 4 on AWS
2. Updating OpenShift Container Platform 4 - Control-Plane Node(s) to add more CPU and Memory 
3. Check whether a potential MachineConfig update is being applied

Actual results:

MachineConfig update is being rolled out to all OpenShift Container Platform 4 - Node(s) after adding CPU and Memoy to OpenShift Container Platform 4 - Control-Plane Node(s) as nodeStatusUpdateFrequency is being updated, which is rather unexpected or not clear why it's happening.

Expected results:

Either no new MachineConfig to rollout after such a change or else to have a newly rendered MachineConfig that is being rolled out with information of what changed and why this change was applied

Additional info:

https://github.com/openshift/machine-config-operator/pull/3784

Bug OCPBUGS-160: NS autolabeler requires RoleBinding subject namespace to be set when using ServiceAccount

View the Description View the linked PRs

Description of problem:

The NS autolabeler should adjust the PSS namespace labels such that a previously permitted workload (based on the SCCs it has access to) can still run.

The autolabeler requires the RoleBinding's .subjects[].namespace to be set when .subjects[].kind is ServiceAccount even though this is not required by the RBAC system to successfully bind the SA to a Role

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.7.0-0.ci-2021-05-21-142747
Server Version: 4.12.0-0.nightly-2022-08-15-150248
Kubernetes Version: v1.24.0+da80cd0

How reproducible: 100%

Steps to Reproduce:

---
apiVersion: v1
kind: Namespace
metadata:
name: test

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: mysa
namespace: test

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: myrole
namespace: test
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- privileged
resources:
- securitycontextconstraints
verbs:
- use

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: myrb
namespace: test
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: myrole
subjects:
- kind: ServiceAccount
name: mysa
#namespace: test # This is required for the autolabeler

---
kind: Job
apiVersion: batch/v1
metadata:
name: myjob
namespace: test
spec:
template:
spec:
containers:
- name: ubi
image: registry.access.redhat.com/ubi8
command: ["/bin/bash", "-c"]
args: ["whoami; sleep infinity"]
restartPolicy: Never
securityContext:
runAsUser: 0
serviceAccount: mysa
terminationGracePeriodSeconds: 2
{{}}

Actual results:

Applying the manifest, above, the Job's pod will not start:

$ kubectl -n test describe job/myjob...Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreate 20s job-controller Error creating: pods "myjob-zxcvv" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Warning FailedCreate 20s job-controller Error creating: pods "myjob-fkb9x" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
Warning FailedCreate 10s job-controller Error creating: pods "myjob-5klpc" is forbidden: violates PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "ubi" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "ubi" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "ubi" must set securityContext.runAsNonRoot=true), runAsUser=0 (pod must not set runAsUser=0), seccompProfile (pod or container "ubi" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

Uncommenting the "namespace" field in the RoleBinding will allow it to start as the autolabeler will adjust the Namespace labels.

However, the namespace field isn't actually required by the RBAC system. Instead of using the autolabeler, the pod can be allowed to run by (w/o uncommenting the field):

$ kubectl label ns/test security.openshift.io/scc.podSecurityLabelSync=false
namespace/test labeled
$ kubectl label ns/test pod-security.kubernetes.io/enforce=privileged --overwrite
namespace/test labeled

We now see that the pod is running as root and has access to the privileged scc:

$ kubectl -n test get po -oyaml
apiVersion: v1
items:
- apiVersion: v1
kind: Pod
metadata:
annotations:
k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.129.2.18/23"],"mac_address":"0a:58:0a:81:02:12","gateway_ips":["10.129.2.1"],"ip_address":"10.129.2.18/23","gateway_ip":"10.129.2.1"'}}
k8s.v1.cni.cncf.io/network-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.129.2.18"
],
"mac": "0a:58:0a:81:02:12",
"default": true,
"dns": {}
}]
k8s.v1.cni.cncf.io/networks-status: |-
[{
"name": "ovn-kubernetes",
"interface": "eth0",
"ips": [
"10.129.2.18"
],
"mac": "0a:58:0a:81:02:12",
"default": true,
"dns": {}
}]
openshift.io/scc: privileged
creationTimestamp: "2022-08-16T13:08:24Z"
generateName: myjob-
labels:
controller-uid: 1867dbe6-73b2-44ea-a324-45c9273107b8
job-name: myjob
name: myjob-rwjmv
namespace: test
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: myjob
uid: 1867dbe6-73b2-44ea-a324-45c9273107b8
resourceVersion: "36418"
uid: 39f18dea-31d4-4783-85b5-8ae6a8bec1f4
spec:
containers:
- args:
- whoami; sleep infinity
command:
- /bin/bash
- -c
image: registry.access.redhat.com/ubi8
imagePullPolicy: Always
name: ubi
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-6f2h6
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
imagePullSecrets:
- name: mysa-dockercfg-mvmtn
nodeName: ip-10-0-140-172.ec2.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext:
runAsUser: 0
serviceAccount: mysa
serviceAccountName: mysa
terminationGracePeriodSeconds: 2
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-6f2h6
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
- configMap:
items:
- key: service-ca.crt
path: service-ca.crt
name: openshift-service-ca.crt
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:24Z"
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:28Z"
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:28Z"
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: "2022-08-16T13:08:24Z"
status: "True"
type: PodScheduled
containerStatuses:
- containerID: cri-o://8fd1c3a5ee565a1089e4e6032bd04bceabb5ab3946c34a2bb55d3ee696baa007
image: registry.access.redhat.com/ubi8:latest
imageID: registry.access.redhat.com/ubi8@sha256:08e221b041a95e6840b208c618ae56c27e3429c3dad637ece01c9b471cc8fac6
lastState: {}
name: ubi
ready: true
restartCount: 0
started: true
state:
running:
startedAt: "2022-08-16T13:08:28Z"
hostIP: 10.0.140.172
phase: Running
podIP: 10.129.2.18
podIPs:
- ip: 10.129.2.18
qosClass: BestEffort
startTime: "2022-08-16T13:08:24Z"
kind: List
metadata:
resourceVersion: ""
{{}}

$ kubectl -n test logs job/myjob
root

Expected results:

The autolabeler should properly follow the RoleBinding back to the SCC

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/107

Bug OCPBUGS-17701: EUS upgrade from 4.12 ->4.14 is not working

View the Description View the linked PRs

Description of problem:

 On attempting to perform EUS->EUS upgrade from 4.12.z->4.14 (CI builds), I am seeing consistently that after upgrade OCP to 4.14, worker machine configpool goes to degraded state, complaining about {noformat}message: 'Node c01-dbn-412-tzm44-worker-0-7w6wg is reporting: "failed to run
        nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no such file
        or directory", Node c01-dbn-412-tzm44-worker-0-cmqsl is reporting: "failed
        to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl: no
        such file or directory", Node c01-dbn-412-tzm44-worker-0-qrp6v is reporting:
        "failed to run nmstatectl: fork/exec /run/machine-config-daemon-bin/nmstatectl:
        no such file or directory"'
{noformat}. And then clusterversion reports error:
{noformat}
[cloud-user@ocp-psi-executor dbasunag]$ oc get clusterversion
NAME      VERSION                         AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.ci-2023-08-14-110508   True        True          125m    Unable to apply 4.14.0-0.ci-2023-08-14-152624: wait has exceeded 40 minutes for these operators: machine-config
[cloud-user@ocp-psi-executor dbasunag]$
{noformat}
This is consistently reproducible in clusters with knmstate installed.

Version-Release number of selected component (if applicable):

4.12.29 -> 4.13.0-0.ci-2023-08-14-110508->4.14.0-0.ci-2023-08-14-152624

How reproducible:

100%

Steps to Reproduce:

1. Perform EUS upgrade on a cluster with CNV, ODF, Knmstate
2. After pausing worker mcp, upgraded OCP, ODF, CNV, KNMstate to 4.13 - everything worked fine
3. After upgrading OCP to 4.14, when master mcp is updated, worker mcp went to degraded state and clusterversion eventually reported error (all the master nodes were updated)

Actual results:

[cloud-user@ocp-psi-executor dbasunag]$ oc get co
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-08-14-152624   True        False         False      9h      
baremetal                                  4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cloud-controller-manager                   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cloud-credential                           4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
cluster-autoscaler                         4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
config-operator                            4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
console                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
control-plane-machine-set                  4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
csi-snapshot-controller                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
dns                                        4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
etcd                                       4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
image-registry                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
ingress                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
insights                                   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-apiserver                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-controller-manager                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-scheduler                             4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
kube-storage-version-migrator              4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
machine-api                                4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
machine-approver                           4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
machine-config                             4.13.0-0.ci-2023-08-14-110508   True        True          True       2d23h   Unable to apply 4.14.0-0.ci-2023-08-14-152624: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool worker is not ready, retrying. Status: (pool degraded: true total: 3, ready 0, updated: 0, unavailable: 3)]]
marketplace                                4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
monitoring                                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
network                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
node-tuning                                4.14.0-0.ci-2023-08-14-152624   True        False         False      95m     
openshift-apiserver                        4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
openshift-controller-manager               4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
openshift-samples                          4.14.0-0.ci-2023-08-14-152624   True        False         False      98m     
operator-lifecycle-manager                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-08-14-152624   True        False         False      2d22h   
service-ca                                 4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
storage                                    4.14.0-0.ci-2023-08-14-152624   True        False         False      2d23h   
[cloud-user@ocp-psi-executor dbasunag]$ 
[cloud-user@ocp-psi-executor dbasunag]$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-693b054330417fe5e098b58716603fc8   True      False      False      3              3                   3                     0                      2d23h
worker   rendered-worker-b2f5a9084e9919b4c1c491658c73bce5   False     False      True       3              0                   0                     3                      2d23h
[cloud-user@ocp-psi-executor dbasunag]$
[cloud-user@ocp-psi-executor dbasunag]$ oc get node
NAME                               STATUS   ROLES                  AGE     VERSION
c01-dbn-412-tzm44-master-0         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-master-1         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-master-2         Ready    control-plane,master   2d23h   v1.27.4+deb2c60
c01-dbn-412-tzm44-worker-0-7w6wg   Ready    worker                 2d22h   v1.25.11+1485cc9
c01-dbn-412-tzm44-worker-0-cmqsl   Ready    worker                 2d22h   v1.25.11+1485cc9
c01-dbn-412-tzm44-worker-0-qrp6v   Ready    worker                 2d22h   v1.25.11+1485cc9
[cloud-user@ocp-psi-executor dbasunag]$

Expected results:

EUS upgrade should work without error

Additional info:

Must-gather can be found here: https://drive.google.com/drive/folders/1SCZoYpGiRpOteTM-sTLmbfgr3hqsICVO?usp=drive_link

https://github.com/openshift/machine-config-operator/pull/3860

Bug OCPBUGS-17258: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/747

Bug OCPBUGS-18052: SNO installation does not finish due to machine-config waiting for a non existing machine config

View the Description View the linked PRs

Description of problem:

SNO installation does not finish due to machine-config waiting for a non existing machine config.

 oc get co machine-config
NAME             VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config             True        True          True       14h     Unable to apply 4.14.0-0.nightly-2023-08-23-075058: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error MachineConfigPool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 1)]]

oc -n openshift-machine-config-operator logs machine-config-daemon-2stpc --tail 5
Defaulted container "machine-config-daemon" out of: machine-config-daemon, kube-rbac-proxy
I0824 07:39:12.117508   22874 daemon.go:1370] In bootstrap mode
E0824 07:39:12.117525   22874 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-231b9341930d0616544ad05989a5c1b8" not found
W0824 07:40:12.131400   22874 daemon.go:1630] Failed to persist NIC names: open /etc/systemd/network: no such file or directory
I0824 07:40:12.131417   22874 daemon.go:1370] In bootstrap mode
E0824 07:40:12.131429   22874 writer.go:226] Marking Degraded due to: machineconfig.machineconfiguration.openshift.io "rendered-master-231b9341930d0616544ad05989a5c1b8" not found

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-23-075058

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with Telco DU profile
2. Wait for installation to finish

Actual results:

Installation doesn't complete due to master MCP being degraded waiting for a non-existing machineconfig.

Expected results:

Installation succeeds.

Additional info:

Attaching sosreport and must-gather

https://github.com/openshift/cluster-node-tuning-operator/pull/778

Bug OCPBUGS-24489: [4.14] BMH keep showing power status as off while IMM is powered on

View the Description View the linked PRs

Description of problem:

BMH is showing powered off even when node is up, this is causing cu's software to behave incorrectly due to incorrect status on BMH 

$ oc get bmh -n openshift-machine-api control-1-ru2 -o json | jq '.status|.operationalStatus,.poweredOn,.provisioning.state'
"OK"
false
"externally provisioned"


Following error can be seen:
2023-10-10T06:05:02.554453960Z {"level":"info","ts":1696917902.5544183,"logger":"provisioner.ironic","msg":"could not update node settings in ironic, busy","host":"openshift-machine-api~control-1-ru4"}

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1.Launch the cluster with OCP v4.12.32 on Lenovo servers  
2.
3.

Actual results:

It is giving false report of node status

Expected results:

It should report correct status of node

Additional info:

Bug MGMT-14161: Wrong cpu_architecture spec for InfraEnvs

View the Description View the linked PRs

Description of the problem:

According to swagger.yaml cpu_architecture in infra-envs can include 'multi', but that only makes sense in the cluster entity.

(Slack thread: https://redhat-internal.slack.com/archives/CUPJTHQ5P/p1680095368006089)

How reproducible:

100%

Steps to reproduce:

1. Check out the swagger.yaml here

Actual results:

enum: ['x86_64', 'aarch64', 'arm64','ppc64le','s390x','multi']

Expected results:

enum: ['x86_64', 'aarch64', 'arm64','ppc64le','s390x']

https://github.com/openshift/assisted-service/pull/5098

Bug OCPBUGS-11197: Rephrase vCenter connection plugin based on feedback

View the Description View the linked PRs

Description of problem:

During the documentation writing phase, we have received suggestions to improve texts in the vSphere Connection modal. We should address them.

https://docs.google.com/document/d/1jLnHuJyOR5nyuFTpSO6LcuHDVrVGUSs2EMpLFey1qDQ/edit

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy OCP cluster on the vSphere platform
2. On the homepage of the Console, see VCenter status plugin
3.

Actual results:

Expected results:

Additional info:

It's about rephrasing only.

https://github.com/openshift/console/pull/12694

Bug OCPBUGS-11971: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13016

Bug OCPBUGS-13153: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1996

Bug OCPBUGS-10526: EgressIP doesn't work in GCP XPN cluster

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

 4.13.0-0.nightly-2023-03-17-161027

How reproducible:

Always

Steps to Reproduce:

1.  Create a GCP XPN cluster with flexy job template ipi-on-gcp/versioned-installer-xpn-ci, then 'oc descirbe node'

2. Check logs for cloud-network-config-controller pods

Actual results:


 % oc get nodes
NAME                                                          STATUS   ROLES                  AGE    VERSION
huirwang-0309d-r85mj-master-0.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-master-1.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-master-2.c.openshift-qe.internal         Ready    control-plane,master   173m   v1.26.2+06e8c46
huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal   Ready    worker                 162m   v1.26.2+06e8c46
huirwang-0309d-r85mj-worker-b-5txgq.c.openshift-qe.internal   Ready    worker                 162m   v1.26.2+06e8c46
 `oc describe node`, there is no related egressIP annotations 
% oc describe node huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal 
Name:               huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=n2-standard-4
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=us-central1
                    failure-domain.beta.kubernetes.io/zone=us-central1-a
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal
                    kubernetes.io/os=linux
                    machine.openshift.io/interruptible-instance=
                    node-role.kubernetes.io/worker=
                    node.kubernetes.io/instance-type=n2-standard-4
                    node.openshift.io/os_id=rhcos
                    topology.gke.io/zone=us-central1-a
                    topology.kubernetes.io/region=us-central1
                    topology.kubernetes.io/zone=us-central1-a
Annotations:        csi.volume.kubernetes.io/nodeid:
                      {"pd.csi.storage.gke.io":"projects/openshift-qe/zones/us-central1-a/instances/huirwang-0309d-r85mj-worker-a-wsrls"}
                    k8s.ovn.org/host-addresses: ["10.0.32.117"]
                    k8s.ovn.org/l3-gateway-config:
                      {"default":{"mode":"shared","interface-id":"br-ex_huirwang-0309d-r85mj-worker-a-wsrls.c.openshift-qe.internal","mac-address":"42:01:0a:00:...
                    k8s.ovn.org/node-chassis-id: 7fb1870c-4315-4dcb-910c-0f45c71ad6d3
                    k8s.ovn.org/node-gateway-router-lrp-ifaddr: {"ipv4":"100.64.0.5/16"}
                    k8s.ovn.org/node-mgmt-port-mac-address: 16:52:e3:8c:13:e2
                    k8s.ovn.org/node-primary-ifaddr: {"ipv4":"10.0.32.117/32"}
                    k8s.ovn.org/node-subnets: {"default":["10.131.0.0/23"]}
                    machine.openshift.io/machine: openshift-machine-api/huirwang-0309d-r85mj-worker-a-wsrls
                    machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
                    machineconfiguration.openshift.io/currentConfig: rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/desiredConfig: rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/desiredDrain: uncordon-rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/lastAppliedDrain: uncordon-rendered-worker-bec5065070ded51e002c566a9c5bd16a
                    machineconfiguration.openshift.io/reason: 
                    machineconfiguration.openshift.io/state: Done
                    volumes.kubernetes.io/controller-managed-attach-detach: true


 % oc logs cloud-network-config-controller-5cd96d477d-2kmc9  -n openshift-cloud-network-config-controller  
W0320 03:00:08.981493       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0320 03:00:08.982280       1 leaderelection.go:248] attempting to acquire leader lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock...
E0320 03:00:38.982868       1 leaderelection.go:330] error retrieving resource lock openshift-cloud-network-config-controller/cloud-network-config-controller-lock: Get "https://api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": dial tcp: lookup api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com: i/o timeout
E0320 03:01:23.863454       1 leaderelection.go:330] error retrieving resource lock openshift-cloud-network-config-controller/cloud-network-config-controller-lock: Get "https://api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com:6443/api/v1/namespaces/openshift-cloud-network-config-controller/configmaps/cloud-network-config-controller-lock": dial tcp: lookup api-int.huirwang-0309d.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: read udp 10.129.0.14:52109->172.30.0.10:53: read: connection refused
I0320 03:02:19.249359       1 leaderelection.go:258] successfully acquired lease openshift-cloud-network-config-controller/cloud-network-config-controller-lock
I0320 03:02:19.250662       1 controller.go:88] Starting node controller
I0320 03:02:19.250681       1 controller.go:91] Waiting for informer caches to sync for node workqueue
I0320 03:02:19.250693       1 controller.go:88] Starting secret controller
I0320 03:02:19.250703       1 controller.go:91] Waiting for informer caches to sync for secret workqueue
I0320 03:02:19.250709       1 controller.go:88] Starting cloud-private-ip-config controller
I0320 03:02:19.250715       1 controller.go:91] Waiting for informer caches to sync for cloud-private-ip-config workqueue
I0320 03:02:19.258642       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal to node workqueue
I0320 03:02:19.258671       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal to node workqueue
I0320 03:02:19.258682       1 controller.go:182] Assigning key: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal to node workqueue
I0320 03:02:19.351258       1 controller.go:96] Starting node workers
I0320 03:02:19.351303       1 controller.go:102] Started node workers
I0320 03:02:19.351298       1 controller.go:96] Starting secret workers
I0320 03:02:19.351331       1 controller.go:102] Started secret workers
I0320 03:02:19.351265       1 controller.go:96] Starting cloud-private-ip-config workers
I0320 03:02:19.351508       1 controller.go:102] Started cloud-private-ip-config workers
E0320 03:02:19.589704       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-1.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.615551       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-0.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.644628       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-2.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.774047       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-0.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-0.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.783309       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-1.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-1.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue
E0320 03:02:19.816430       1 controller.go:165] error syncing 'huirwang-0309d-r85mj-master-2.c.openshift-qe.internal': error retrieving the private IP configuration for node: huirwang-0309d-r85mj-master-2.c.openshift-qe.internal, err: error retrieving the network interface subnets, err: googleapi: Error 404: The resource 'projects/openshift-qe/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' was not found, notFound, requeuing in node workqueue

Expected results:

EgressIP should work

Additional info:

It can be reproduced in  4.12 as well, not regression issue.

https://github.com/openshift/cloud-network-config-controller/pull/100

Bug OCPBUGS-14818: Disable oVirt provider in installer

View the Description View the linked PRs

Due to the EOL of RHV in OCP, we'll need to disable oVirt as an installation option in the installer.
Note: The first step is disabling it. Removing all related code from the installer will be done in a later release.

https://github.com/openshift/installer/pull/7213

Bug OCPBUGS-7694: prometheus adapter crashlooping

View the Description View the linked PRs

Trying to update my cluster from 4.12.0 to 4.12.2 and this resulted in a crashlooping state for both prometheus adapter pods. Tried to downgrade back to 4.12.0 and then upgrade to 4.12.4 but neither approach solved the situation.

What I can see in the logs of the adapters is the following:

I0216 15:24:59.144559 1 adapter.go:114] successfully using in-cluster auth
I0216 15:25:00.345620 1 request.go:601] Waited for 1.180640418s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/operators.coreos.com/v1alpha1?timeout=32s
I0216 15:25:10.345634 1 request.go:601] Waited for 11.180149045s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/triggers.tekton.dev/v1beta1?timeout=32s
I0216 15:25:20.346048 1 request.go:601] Waited for 2.597453714s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/apiextensions.k8s.io/v1?timeout=32s
I0216 15:25:30.347435 1 request.go:601] Waited for 12.598768922s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/machine.openshift.io/v1beta1?timeout=32s
I0216 15:25:40.545767 1 request.go:601] Waited for 22.797001115s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/samples.operator.openshift.io/v1?timeout=32s
I0216 15:25:50.546588 1 request.go:601] Waited for 32.797748538s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/metrics.k8s.io/v1beta1?timeout=32s
I0216 15:25:56.041594 1 secure_serving.go:210] Serving securely on [::]:6443
I0216 15:25:56.042265 1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
I0216 15:25:56.042971 1 dynamic_cafile_content.go:157] "Starting controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"
I0216 15:25:56.043309 1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
I0216 15:25:56.043310 1 object_count_tracker.go:84] "StorageObjectCountTracker pruner is exiting"
I0216 15:25:56.043398 1 dynamic_serving_content.go:146] "Shutting down controller" name="serving-cert::/etc/tls/private/tls.crt::/etc/tls/private/tls.key"
I0216 15:25:56.043562 1 tlsconfig.go:255] "Shutting down DynamicServingCertificateController"
I0216 15:25:56.043606 1 dynamic_cafile_content.go:157] "Starting controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
I0216 15:25:56.043614 1 secure_serving.go:255] Stopped listening on [::]:6443
I0216 15:25:56.043621 1 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/tls/private/client-ca-file"
I0216 15:25:56.043635 1 dynamic_cafile_content.go:171] "Shutting down controller" name="request-header::/etc/tls/private/requestheader-client-ca-file"

I also tried to search online for known issues and bugs and found this one that might be related:

https://github.com/kubernetes-sigs/metrics-server/issues/983

I also tried rebooting the server but it didn't help.

Need a workaround at least because at the moment the cluster is still in a pending stage.

https://github.com/openshift/cluster-monitoring-operator/pull/1917

Bug OCPBUGS-9070: CVO hotloops on CronJob openshift-operator-lifecycle-manager/collect-profiles

View the Description View the linked PRs

Description of problem:

In a fresh installed cluster, we can see hot-loopings on Service openshift-monitoring/cluster-monitoring-operator.

grep -o 'Updating .*due to diff' cvo2.log | sort | uniq -c
18 Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff
12 Updating Service openshift-monitoring/cluster-monitoring-operator due to diff

Looking at the CronJob hot-looping

# grep -A60 'Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff' cvo2.log | tail -n61
I0110 06:32:44.489277       1 generic.go:109] Updating CronJob openshift-operator-lifecycle-manager/collect-profiles due to diff:   &unstructured.Unstructured{
  	Object: map[string]interface{}{
  		"apiVersion": string("batch/v1"),
  		"kind":       string("CronJob"),
  		"metadata":   map[string]interface{}{"annotations": map[string]interface{}{"include.release.openshift.io/ibm-cloud-managed": string("true"), "include.release.openshift.io/self-managed-high-availability": string("true")}, "creationTimestamp": string("2022-01-10T04:35:19Z"), "generation": int64(1), "managedFields": []interface{}{map[string]interface{}{"apiVersion": string("batch/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:metadata": map[string]interface{}{"f:annotations": map[string]interface{}{".": map[string]interface{}{}, "f:include.release.openshift.io/ibm-cloud-managed": map[string]interface{}{}, "f:include.release.openshift.io/self-managed-high-availability": map[string]interface{}{}}, "f:ownerReferences": map[string]interface{}{".": map[string]interface{}{}, `k:{"uid":"334d6c04-126d-4271-96ec-d303e93b7d1c"}`: map[string]interface{}{}}}, "f:spec": map[string]interface{}{"f:concurrencyPolicy": map[string]interface{}{}, "f:failedJobsHistoryLimit": map[string]interface{}{}, "f:jobTemplate": map[string]interface{}{"f:spec": map[string]interface{}{"f:template": map[string]interface{}{"f:spec": map[string]interface{}{"f:containers": map[string]interface{}{`k:{"name":"collect-profiles"}`: map[string]interface{}{".": map[string]interface{}{}, "f:args": map[string]interface{}{}, "f:command": map[string]interface{}{}, "f:image": map[string]interface{}{}, ...}}, "f:dnsPolicy": map[string]interface{}{}, "f:priorityClassName": map[string]interface{}{}, "f:restartPolicy": map[string]interface{}{}, ...}}}}, "f:schedule": map[string]interface{}{}, ...}}, "manager": string("cluster-version-operator"), ...}, map[string]interface{}{"apiVersion": string("batch/v1"), "fieldsType": string("FieldsV1"), "fieldsV1": map[string]interface{}{"f:status": map[string]interface{}{"f:lastScheduleTime": map[string]interface{}{}, "f:lastSuccessfulTime": map[string]interface{}{}}}, "manager": string("kube-controller-manager"), ...}}, ...},
  		"spec": map[string]interface{}{
+ 			"concurrencyPolicy":      string("Allow"),
+ 			"failedJobsHistoryLimit": int64(1),
  			"jobTemplate": map[string]interface{}{
+ 				"metadata": map[string]interface{}{"creationTimestamp": nil},
  				"spec": map[string]interface{}{
  					"template": map[string]interface{}{
+ 						"metadata": map[string]interface{}{"creationTimestamp": nil},
  						"spec": map[string]interface{}{
  							"containers": []interface{}{
  								map[string]interface{}{
  									... // 4 identical entries
  									"name":                     string("collect-profiles"),
  									"resources":                map[string]interface{}{"requests": map[string]interface{}{"cpu": string("10m"), "memory": string("80Mi")}},
+ 									"terminationMessagePath":   string("/dev/termination-log"),
+ 									"terminationMessagePolicy": string("File"),
  									"volumeMounts":             []interface{}{map[string]interface{}{"mountPath": string("/etc/config"), "name": string("config-volume")}, map[string]interface{}{"mountPath": string("/var/run/secrets/serving-cert"), "name": string("secret-volume")}},
  								},
  							},
+ 							"dnsPolicy":                     string("ClusterFirst"),
  							"priorityClassName":             string("openshift-user-critical"),
  							"restartPolicy":                 string("Never"),
+ 							"schedulerName":                 string("default-scheduler"),
+ 							"securityContext":               map[string]interface{}{},
+ 							"serviceAccount":                string("collect-profiles"),
  							"serviceAccountName":            string("collect-profiles"),
+ 							"terminationGracePeriodSeconds": int64(30),
  							"volumes": []interface{}{
  								map[string]interface{}{
  									"configMap": map[string]interface{}{
+ 										"defaultMode": int64(420),
  										"name":        string("collect-profiles-config"),
  									},
  									"name": string("config-volume"),
  								},
  								map[string]interface{}{
  									"name": string("secret-volume"),
  									"secret": map[string]interface{}{
+ 										"defaultMode": int64(420),
  										"secretName":  string("pprof-cert"),
  									},
  								},
  							},
  						},
  					},
  				},
  			},
  			"schedule":                   string("*/15 * * * *"),
+ 			"successfulJobsHistoryLimit": int64(3),
+ 			"suspend":                    bool(false),
  		},
  		"status": map[string]interface{}{"lastScheduleTime": string("2022-01-10T06:30:00Z"), "lastSuccessfulTime": string("2022-01-10T06:30:11Z")},
  	},
  }
I0110 06:32:44.499764       1 sync_worker.go:771] Done syncing for cronjob "openshift-operator-lifecycle-manager/collect-profiles" (574 of 765)
I0110 06:32:44.499814       1 sync_worker.go:759] Running sync for deployment "openshift-operator-lifecycle-manager/olm-operator" (575 of 765)

Extract the manifest:

# cat 0000_50_olm_07-collect-profiles.cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
  name: collect-profiles
  namespace: openshift-operator-lifecycle-manager
spec:
  schedule: "*/15 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: collect-profiles
          priorityClassName: openshift-user-critical
          containers:
            - name: collect-profiles
              image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a8d116943a7c1eb32cd161a0de5cb173713724ff419a03abe0382a2d5d9c9a7
              imagePullPolicy: IfNotPresent
              command:
                - bin/collect-profiles
              args:
                - -n
                - openshift-operator-lifecycle-manager
                - --config-mount-path
                - /etc/config
                - --cert-mount-path
                - /var/run/secrets/serving-cert
                - olm-operator-heap-:https://olm-operator-metrics:8443/debug/pprof/heap
                - catalog-operator-heap-:https://catalog-operator-metrics:8443/debug/pprof/heap
              volumeMounts:
                - mountPath: /etc/config
                  name: config-volume
                - mountPath: /var/run/secrets/serving-cert
                  name: secret-volume
              resources:
                requests:
                  cpu: 10m
                  memory: 80Mi
          volumes:
            - name: config-volume
              configMap:
                name: collect-profiles-config
            - name: secret-volume
              secret:
                secretName: pprof-cert
          restartPolicy: Never

Looking at the in-cluster object:

# oc get cronjob.batch/collect-profiles -oyaml -n openshift-operator-lifecycle-manager
apiVersion: batch/v1
kind: CronJob
metadata:
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
  creationTimestamp: "2022-01-10T04:35:19Z"
  generation: 1
  name: collect-profiles
  namespace: openshift-operator-lifecycle-manager
  ownerReferences:
  - apiVersion: config.openshift.io/v1
    kind: ClusterVersion
    name: version
    uid: 334d6c04-126d-4271-96ec-d303e93b7d1c
  resourceVersion: "450801"
  uid: d0b92cd3-3213-466c-921c-d4c4c77f7a6b
spec:
  concurrencyPolicy: Allow
  failedJobsHistoryLimit: 1
  jobTemplate:
    metadata:
      creationTimestamp: null
    spec:
      template:
        metadata:
          creationTimestamp: null
        spec:
          containers:
          - args:
            - -n
            - openshift-operator-lifecycle-manager
            - --config-mount-path
            - /etc/config
            - --cert-mount-path
            - /var/run/secrets/serving-cert
            - olm-operator-heap-:https://olm-operator-metrics:8443/debug/pprof/heap
            - catalog-operator-heap-:https://catalog-operator-metrics:8443/debug/pprof/heap
            command:
            - bin/collect-profiles
            image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2a8d116943a7c1eb32cd161a0de5cb173713724ff419a03abe0382a2d5d9c9a7
            imagePullPolicy: IfNotPresent
            name: collect-profiles
            resources:
              requests:
                cpu: 10m
                memory: 80Mi
            terminationMessagePath: /dev/termination-log
            terminationMessagePolicy: File
            volumeMounts:
            - mountPath: /etc/config
              name: config-volume
            - mountPath: /var/run/secrets/serving-cert
              name: secret-volume
          dnsPolicy: ClusterFirst
          priorityClassName: openshift-user-critical
          restartPolicy: Never
          schedulerName: default-scheduler
          securityContext: {}
          serviceAccount: collect-profiles
          serviceAccountName: collect-profiles
          terminationGracePeriodSeconds: 30
          volumes:
          - configMap:
              defaultMode: 420
              name: collect-profiles-config
            name: config-volume
          - name: secret-volume
            secret:
              defaultMode: 420
              secretName: pprof-cert
  schedule: '*/15 * * * *'
  successfulJobsHistoryLimit: 3
  suspend: false
status:
  lastScheduleTime: "2022-01-11T03:00:00Z"
  lastSuccessfulTime: "2022-01-11T03:00:07Z"

Version-Release number of the following components:
4.10.0-0.nightly-2022-01-09-195852

How reproducible:
1/1

Steps to Reproduce:
1.Install a 4.10 cluster
2. Grep 'Updating .*due to diff' in the cvo log to check hot-loopings
3.

Actual results:
CVO hotloops on CronJob openshift-operator-lifecycle-manager/collect-profiles

Expected results:
CVO should not hotloop on it in a fresh installed cluster

Additional info:
attachment 1850058 CVO log file

https://github.com/openshift/cluster-version-operator/pull/910

Bug OCPBUGS-12783: Remove reference to "action" descriptors in the OLM Descriptor readme

View the Description View the linked PRs

The OLM descriptors README references an "action" descriptor that was never implemented. This needs to be removed to eliminate confusion.

https://github.com/openshift/console/blob/master/frontend/packages/operator-lifecycle-manager/src/components/descriptors/README.md#:~:text=Action,on%20an%20object

https://github.com/openshift/console/pull/12800

Bug OCPBUGS-13549: Failed to create STS resources on AWS GovCloud regions using ccoctl

View the Description View the linked PRs

Description of problem:

Incorrect AWS ARN [1] is used for GovCloud and AWS China regions, which will cause the command `ccoctl aws create-all` to fail:

Failed to create Identity provider: failed to apply public access policy to the bucket ci-op-bb5dgq54-77753-oidc: MalformedPolicy: Policy has invalid resource
	status code: 400, request id: VNBZ3NYDH6YXWFZ3, host id: pHF8v7C3vr9YJdD9HWamFmRbMaOPRbHSNIDaXUuUyrgy0gKCO9DDFU/Xy8ZPmY2LCjfLQnUDmtQ=

Correct AWS ARN prefix:
GovCloud (us-gov-east-1 and us-gov-west-1): arn:aws-us-gov
AWS China (cn-north-1 and cn-northwest-1): arn:aws-cn

[1] https://github.com/openshift/cloud-credential-operator/pull/526/files#diff-1909afc64595b92551779d9be99de733f8b694cfb6e599e49454b380afc58876R211

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-05-11-024616

How reproducible:

Always

Steps to Reproduce:

1. Run command: `aws create-all --name="${infra_name}" --region="${REGION}" --credentials-requests-dir="/tmp/credrequests" --output-dir="/tmp"` on GovCloud regions
2.
3.

Actual results:

Failed to create Identity provider

Expected results:

Create resources successfully.

Additional info:

Related PRs:
4.10: https://github.com/openshift/cloud-credential-operator/pull/531
4.11: https://github.com/openshift/cloud-credential-operator/pull/530
4.12: https://github.com/openshift/cloud-credential-operator/pull/529
4.13: https://github.com/openshift/cloud-credential-operator/pull/528
4.14: https://github.com/openshift/cloud-credential-operator/pull/526

https://github.com/openshift/cloud-credential-operator/pull/537

Bug OCPBUGS-14660: Helm Repository "Edit" button results in 404

View the Description View the linked PRs

Description of problem:

Helm view in Dev console doesn't allow you to edit Helm repositories through the three dots menu "Edit option". It results in 404.

Prerequisites (if any, like setup, operators/versions):

Tried in 4.13 only, not sure if other versions are affected

Steps to Reproduce

1. Create a new Helm chart repository (/ns/<NAMESPACE>/helmchartrepositories/~new/form endpoint)
2. List all the custom Helm repositories ( /helm-releases/ns/<NAMESPACE>/repositories endpoint)
3. Click three dots menu on the right of any chart repository and select "Edit ProjectHelmChartRepository" (leads to /k8s/ns/<NAMESPACE>/helmchartrepositories/<REPO_NAME>/edit)
4. You land on 404 page

Actual results:

404 page, see the attached GIF

Expected results:

Edit view

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

Observed in OCP 4.13 (Dev sandbox and OpenShift Local)

Workaround:

Follow steps 1 and 2. from the reproducer above
3. Click on Helm repository name
4. Click YAML tab to edit resource (/k8s/ns/<NAMESPACE>/helm.openshift.io~v1beta1~ProjectHelmChartRepository/<REPO_NAME>/yaml endpoint)

Additional info:

https://github.com/openshift/console/pull/12891

Bug OCPBUGS-14696: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-aws/pull/45

Bug OCPBUGS-16174: Update VSCode Extension and link and descriptions on Create Serverless Function form

View the Description View the linked PRs

Description of problem:

Update the VScode extension link to https://marketplace.visualstudio.com/items?itemName=redhat.vscode-openshift-connector

And change the description to

The OpenShift Serverless Functions support in the VSCode IDE extension enables developers to effortlessly create, build, run, invoke and deploy serverless functions on OpenShift, providing a seamless development experience within the familiar VSCode environment.

https://github.com/openshift/console/pull/13015

Bug OCPBUGS-18142: [IBM VPC] failed provisioning volume in proxy cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18105~~. The following is the description of the original issue:
—
Description of problem:

IBM VPC CSI Driver failed to provisioning volume in proxy cluster, (if I understand correctly) it seems the proxy in not injected because in our definition (https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml), we are injecting proxy to csi-driver:
    config.openshift.io/inject-proxy: csi-driver
    config.openshift.io/inject-proxy-cabundle: csi-driver
but the container name is iks-vpc-block-driver in https://github.com/openshift/ibm-vpc-block-csi-driver-operator/blob/master/assets/controller.yaml#L153

I checked the proxy in not defined in controller pod or driver container ENV.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create IBM cluster with proxy setting
2. create pvc/pod with IBM VPC CSI Driver

Actual results:

It failed to provisioning volume

Expected results:

Provisioning volume works well on proxy cluster

Additional info:

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/46

Bug OCPBUGS-19627: Multus per node certificates: CNO integration [backport 4.14]

View the Description View the linked PRs

Description of problem: Multus should implement per node certificates via integration in the CNO

https://github.com/openshift/cluster-network-operator/pull/2023

Bug OCPBUGS-11910: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc-mirror/pull/623

Bug OCPBUGS-14341: Deleting CR's from any operator page doesn't indicate is the resource being deleted or not

View the Description View the linked PRs

Description of problem:

When we delete any CR from the common OCP operator page, it would be good to add a indication that resource being deleted or atleast to grey out the dot at the right corner as the user perspective.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Go to Operators -> installed operators -> click any installed operator -> click CRD name from header tab -> delete any CR from list page using kebab menu.
2. No indication about deletion, user can do any action even after deletion is triggered.

Actual results:

 No indication about deletion on kebab menu

Expected results:

grey out the dot and display the tooltip about deletion.

Additional info:

https://github.com/openshift/console/pull/11860 is not fixing this issue for operator page.

https://github.com/openshift/console/pull/13042

Bug OCPBUGS-16089: Using appsDomain recreating canary route can lead to an degraded ingress operator

View the Description View the linked PRs

Description of problem:

In case the [appsDomain|https://docs.openshift.com/container-platform/4.13/networking/ingress-operator.html#nw-ingress-configuring-application-domain_configuring-ingress] is specified and a cluster-admin is deleting accidentally all routes on a cluster, the route canary in the namespace openshift-ingress-canary is created with the domain specified in the .spec.appsDomain instead of .spec.domain of the definition in Ingress.config.openshift.io.

Additionally the docs are a bit confusing. On one page (https://docs.openshift.com/container-platform/4.13/networking/ingress-operator.html#nw-ingress-configuring-application-domain_configuring-ingress) it's defined as 

{code:none}
As a cluster administrator, you can specify an alternative to the default cluster domain for user-created routes by configuring the appsDomain field. The appsDomain field is an optional domain for OpenShift Container Platform to use instead of the default, which is specified in the domain field. If you specify an alternative domain, it overrides the default cluster domain for the purpose of determining the default host for a new route.

For example, you can use the DNS domain for your company as the default domain for routes and ingresses for applications running on your cluster.

In the API spec (https://docs.openshift.com/container-platform/4.11/rest_api/config_apis/ingress-config-openshift-io-v1.html#spec) the correct behaviour is explained

appsDomain is an optional domain to use instead of the one specified in the domain field when a Route is created without specifying an explicit host. If appsDomain is nonempty, this value is used to generate default host values for Route. Unlike domain, appsDomain may be modified after installation. This assumes a new ingresscontroller has been setup with a wildcard certificate.

It would be nice if the wording could be adjusted as `you can specify an alternative to the default cluster domain for user-created routes by configuring` does not fits good as more or less all new created routes (operator created and so on) getting created with the appsDomain.

Version-Release number of selected component (if applicable):{code:none}
OpenShift 4.12.22

How reproducible:

see steps below

Steps to Reproduce:

1. Install OpenShift
2. define .spec.appsDomain in Ingress.config.openshift.io
3. oc delete route canary -n openshift-ingress-canary
4. wait some seconds to get the route recreated and check cluster-operator

Actual results:

Ingress Operator degraded and route recreated with wrong domain (.spec.appsDomain)

Expected results:

Ingress Operator not degraded and route recreated with the correct domain (.spec.domain)

Additional info:

Please see screenshot

https://github.com/openshift/cluster-ingress-operator/pull/965

Bug OCPBUGS-21451: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-node-label-updater/pull/27

Bug OCPBUGS-21759: Platform-operators golangci-lint incompatible with golang version

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21757~~.
—
Description of problem:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run pj-rehearse on https://github.com/openshift/release/pull/44417
2. Observe "lint" rehearsals fail

Actual results:

"lint" rehearsals fail with OOMKilled

Expected results:

"lint" rehearsals pass

Additional info:

https://github.com/openshift/platform-operators/pull/98

Bug OCPBUGS-10829: [Reliability]kube-apiserver's memory usage keep increasing to max 3GB in 7 days

View the Description View the linked PRs

Description of problem:

In 7 day's reliability test, kube-apiserver's memory usage keep increasing. Max is over 3GB.
In our 4.12 test result, the kube-apiserver's memory usage was stable around 1.7 GB and not keep increasing. 
I'll redo the test on a 4.12.0 build to see if I can reproduce this issue.

I'll do a longer than 7 days test to see how high the memory can grow up.

About Reliability Test
https://github.com/openshift/svt/tree/master/reliability-v2

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-14-053612

How reproducible:

Always

Steps to Reproduce:

1. Install an AWS cluster with m5.xlarge type
2. Run reliability test for 7 days
Reliability Test Configuration example:
https://github.com/openshift/svt/tree/master/reliability-v2#groups-and-tasks-1
Config used in this test:
admin: 1 user
dev-test: 15 users
dev-prod: 1 user 
3. Use dittybopper dashboard to monitor the kube-apiserver's memory usage

Actual results:

kube-apiserver's memory usage keep increasing. Max is over 3GB

Expected results:

kube-apiserver's memory usage should not keep increasing

Additional info:

Screenshots are uploaded to shared folder OCPBUGS-10829 - Google Drive

413-kube-apiserver-memory.png
413-api-performance-last2d.png - test was stopped on [2023-03-24 04:21:10 UTC]
412-kube-apiserver-memory.png
must-gather.local.525817950490593011.tar.gz - 4.13 cluster's must gather

https://github.com/openshift/kubernetes/pull/1548

Bug HOSTEDCP-903: AWSEndpointService conditions are not propagated

View the Description View the linked PRs

We faced an issue where the quota was reached for VPCE. This is visible in the status of AWSEndpointService

  - lastTransitionTime: "2023-03-01T10:23:08Z"
    message: 'failed to create vpc endpoint: VpcEndpointLimitExceeded'
    reason: AWSError
    status: "False"
    type: EndpointAvailable

but it should be propagated to the HC as it blocks worker creation (ignition was not working) and for better visibility.

https://github.com/openshift/hypershift/pull/2278

Bug OCPBUGS-15421: GCP XPN Installs fail when authenticating with CLI

View the Description View the linked PRs

Description of problem:

When authenticating openshift-install with the gcloud cli, rather than using a service account key file, the installer will throw an error because https://github.com/openshift/installer/blob/master/pkg/asset/machines/gcp/machines.go#L170-L178 ALWAYS expects to extract a service account to passthrough to nodes in XPN installs. 

An alternative approach would be to handle the lack of service account without error, and allow the required service accounts to passed in through another mechanism.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Create install config for gcp xpn install
2. Authenticate installer without service account key file (either gcloud cli auth or through a VM).
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7308

Bug OCPBUGS-781: ironic-proxy is using a deprecated field to fetch cluster VIP

View the Description View the linked PRs

Description of problem:

In https://github.com/openshift/cluster-baremetal-operator/blob/master/provisioning/utils.go#L65 we reference .PlatformStatus.BareMetal.APIServerInternalIP attribute from the config API. Meanwhile, a recent change https://github.com/openshift/api/commit/51f399230d604fa013c2bb341040c4ad126e7309 deprecated this field in favour of .APIServerInternalIPs (note plural), this was done to better suit dual stack case.

We need to update the code (trivial) along with vendor dependencies (openshift/api needs a bump to version equal or later to the one including the commit referenced above). Likely there will be code changes required in CBO to adopt to the newer API package.

Slack threads for reference: https://app.slack.com/client/T027F3GAJ/C01RJHA6BRC/thread/C01RJHA6BRC-1661416223.353009 (vendor dependency update)

openshift/api change:
https://coreos.slack.com/archives/C01RJHA6BRC/p1660573560434409?thread_ts=1660229723.998839&cid=C01RJHA6BRC

IMPORTANT NOTE: there is an in-flight PR which is making changes to the CBO code fetching the VIP: https://github.com/openshift/cluster-baremetal-operator/pull/285.

Work done to address this bug needs to be stacked on top of this to avoid duplication of effort (the easiest way is to work on the code from the in-flight PR285 and merge once PR285 merges)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-baremetal-operator/pull/295

Bug OCPBUGS-11142: CPMS: node readiness transitions not always trigger reconcile

View the Description View the linked PRs

Description of problem:

With the recent update in the logic for considering a CPMS replica Ready only when both the backing Machine is running and the backing Node is Ready: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/171, we now need to watch nodes at all times to detect nodes transitioning in readiness.

The majority of occurrences of this issue have been fixed with: https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/177 (https://issues.redhat.com//browse/OCPBUGS-10032) but we also need to watch the control plane nodes at steady state (when they are already Ready), to notice if they go UnReady at any point, as relying on control plane machine events is not enough (they might be Running, while the Node has transitioned to NotReady).

Version-Release number of selected component (if applicable):

4.13, 4.14

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/182

Bug OCPBUGS-14785: update dependencies for ironic-image for OCP 4.14

View the Description View the linked PRs

dependencies for the ironic containers are quite old, we need to upgrade them to the latest available to keep up with upstream requirements

https://github.com/openshift/ironic-image/pull/382

Task MON-3291: Adjust node-exporter's MaxProcs doc now that we set GOMAXPROCS

View the Description View the linked PRs

Now that https://issues.redhat.com//browse/OCPBUGS-13153 sets a default value of GOMAXPROCS before running node exporter, see https://github.com/openshift/cluster-monitoring-operator/pull/1996

The doc at https://github.com/openshift/cluster-monitoring-operator/blob/45bdf6f0148b771618d0dd89c432e7a1932e7a0a/pkg/manifests/types.go#L289-L295 should be adjusted.

https://github.com/openshift/cluster-monitoring-operator/pull/2055

Bug OCPBUGS-12244: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12794

Bug OCPBUGS-13854: Refactor Kubelet Pod Manager Downstream Components

View the Description View the linked PRs

Description of problem:

Backport https://github.com/kubernetes/kubernetes/pull/117371

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/kubernetes/pull/1578

Bug OCPBUGS-16733: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3827

Bug OCPBUGS-11223: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-20508: Regenerating the machine config operator certificates can panic on vSphere

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20063~~. The following is the description of the original issue:
—
Description of problem:

An infra object in some vsphere deployments can look like this:

~]$ oc get infrastructure cluster -o json | jq .status
{
  "apiServerInternalURI": "xxx",
  "apiServerURL": "xxx",
  "controlPlaneTopology": "HighlyAvailable",
  "etcdDiscoveryDomain": "",
  "infrastructureName": "xxx",
  "infrastructureTopology": "HighlyAvailable",
  "platform": "VSphere",
  "platformStatus": {
    "type": "VSphere" 
  }
}

Which if we attempt to run the regenerate MCO command in https://access.redhat.com/articles/regenerating_cluster_certificates will cause a panic

Version-Release number of selected component (if applicable):

4.10.65
4.11.47
4.12.29
4.13.8
4.14.0
4.15

How reproducible:

100%

Steps to Reproduce:

1. Run procedure on cluster with above infra
2.
3.

Actual results:

panic

Expected results:

no panic

Additional info:

https://github.com/openshift/oc/pull/1573

Bug OCPBUGS-6829: while/after upgrading to OKD 4.11 2023-01-14 CoreDNS has a problem with UDP overflows

View the Description View the linked PRs

Description of problem:

While/after upgrading to 4.11 2023-01-14 CoreDNS has a problem with UDP overflows so DNS lookups are very slow and cause the ingress operator upgrade to stall. We needed to work around with force_tcp following this: https://access.redhat.com/solutions/5984291

Version-Release number of selected component (if applicable):

How reproducible:

100%, but seems to depend on the network environemnt (excact cause unknown)

Steps to Reproduce:

1. install cluster with OKD 4.11-2022-12-02 or earlier
2. initiate upgrade to OKD 4.11-2023-01-14
3. upgrade will stall after upgrading CoreDNS

Actual results:

CoreDNS logs: [ERROR] plugin/errors: 2 oauth-openshift.apps.okd-admin.muc.lv1871.de. AAAA: dns: overflowing header size

Expected results:

Additional info:

https://github.com/openshift/cluster-dns-operator/pull/359

Bug OCPBUGS-8237: Terraform is hammering Ironic API with requests

View the Description View the linked PRs

Description of problem:

If you check the Ironic API logs from a bootstrap VM, you'll see that terraform is making several GET requests per second. This is way too much, bare metal machine states do not change that fast. Not even on virtual emulation.

2023-03-01 12:37:38.234 1 INFO eventlet.wsgi.server [None req-c5628ecb-c94c-4b7c-95b3-2ee933ba850b - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0060174[00m
2023-03-01 12:37:38.240 1 INFO eventlet.wsgi.server [None req-275e077e-8ec7-43a9-8948-e1d39b46b331 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0056679[00m
2023-03-01 12:37:38.246 1 INFO eventlet.wsgi.server [None req-0d867822-fcff-4ba0-8773-37415b3f532f - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0056052[00m
2023-03-01 12:37:38.252 1 INFO eventlet.wsgi.server [None req-7e64cb21-869e-4a98-ad18-54adb6e5dec5 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0055907[00m
2023-03-01 12:37:38.258 1 INFO eventlet.wsgi.server [None req-de9995a8-9201-47b0-aa40-505e39b48279 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0055318[00m
2023-03-01 12:37:38.265 1 INFO eventlet.wsgi.server [None req-9e969582-0388-4e47-ad5b-966e1fd2a6da - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0059781[00m
2023-03-01 12:37:38.354 1 INFO eventlet.wsgi.server [None req-84fad0b8-2a28-476e-90c9-ebb6a9cda833 - - - - - -] fd2e:6f44:5dd8:c956::1 "GET /v1/nodes/a7364b73-eefb-4f0a-8d63-753d30b9d090 HTTP/1.1" status: 200  len: 3659 time: 0.0884116[00m

https://github.com/openshift/installer/pull/6956

Story ODC-7319: Correcting - Missing package tag across gherkin files

View the Description View the linked PRs

Description

Multiple gherkin files have missing package tags, these tags can be utilised for further automation. Currently tag allocation is inconsistent across gherkin files.

Acceptance Criteria

Every gherkin file should have package tag in it's first line.

PR: https://github.com/openshift/console/pull/12847

https://github.com/openshift/console/pull/12847

Bug OCPBUGS-5949: oc --icsp mapping scope does not match openshift icsp mapping scope

View the Description View the linked PRs

Description of problem:
The oc client has recently had functionality added to reference an icsp manifest with a variety of commands (using the --icsp flag).

The issue is that the registry/repo scope in an icsp required to trigger a mapping is different between ocp and oc. OCP icsp will match an image at the registry level, where the OC client requires exact registry + repo to match. This difference can cause major confusion (especially without adequate warning/error messages in the oc client).

Example Image to mirror: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404o

In OCP registry.mirrorregistry.com:5000/openshift-release-dev/ will accurately mirror the image

But using OC with --icsp , quay.io/openshift-release-dev/ocp-v4.0-art-dev is required or or the mirroring will not match.

Version-Release number of selected component (if applicable):{code:none}
oc version
Client Version: 4.11.0-202212070335.p0.g1928ac4.assembly.stream-1928ac4
Kustomize Version: v4.5.4
Server Version: 4.12.0-rc.8
Kubernetes Version: v1.25.4+77bec7a

How reproducible:

100%

Steps to Reproduce:
1. Create an ICSP file with content similar to below (Replace with your mirror registry url)

apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  creationTimestamp: null
  name: image-policy
spec:
  repositoryDigestMirrors:
  - mirrors:
    - registry.mirrorregistry.com:5005/openshift-release-dev
    source: quay.io/openshift-release-dev

2. Add the ICSP to a bm openshift cluster and wait for MCP to finish node restarts
3. SSH to a cluster node
4. Try to podman pull the following image with debug log level

podman pull --log-level=debug quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404

5. The log will show the mirror registry is attempted (Which is similar behavior to OCP)
6. Now try to extract a the payload image from the release using oc client and --icsp flag (ICSP file should be the same manifest uses at step 1)

oc adm release extract --command=openshift-baremetal-install --to=/data/install-config-generate/installercache/registry.mirrorregistry.com:5005/openshift-release-dev/ocp-release:4.12.0-rc.8-x86_64 --insecure=false --icsp-file=/tmp/icsp-file1635083302 registry.mirrorregistry.com:5005/openshift-release-dev/ocp-release:4.12.0-rc.8-x86_64 --registry-config=/tmp/registry-config1265925963

Expected results:
openshift-baremetal-install is extracted to the proper directory using the mirrored payload image

Actual result:
oc client does not match the payload image because the icsp is not exact, so it immediately tries quay.io rather than the mirror registry

ited with non-zero exit code 1: \nwarning: --icsp-file only applies to images referenced by digest and will be ignored for tags\nerror: unable to read image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1631b0f0bf9c6dc4f9519ceb06b6ec9277f53f4599853fcfad3b3a47d2afd404: Get \"https://quay.io/v2/\": dial tcp 52.203.129.140:443: i/o timeout\n" func=github.com/openshift/assisted-service/internal/oc.execute file="/remote-source/assisted-service/app/internal/oc/release.go:404" go-id=26228 request_id=

Additional info:

I understand that oc-mirror or oc adm release mirror provides an icsp manifest to use, but as OCP itself allows for a wider scope for mapping, it can cause great confusion that oc icsp scope is not in parity. 

At the very least a warning/error message in the oc client when the icsp partially matches an image (but is not used) would be VERY useful.

https://github.com/openshift/oc/pull/1350

Bug OCPBUGS-10570: Wrong PrimarySubnet in OpenstackProviderSpec when using Failure Domains

View the Description View the linked PRs

What happens:

When deploying OpenShift 4.13 with Failure Domains, the PrimarySubnet in the ProviderSpec of the Machine is set to the MachinesSubnet set in install-config.yaml.

What is expected:

Machines in failure domains with a control-plane port target should not use the MachinesSubnet as a primary subnet in the provider spec. it should be the ID of the subnet that is actually used for the control plane on that domain.

How to reproduce:

install-config.yaml:

apiVersion: v1
baseDomain: shiftstack.com
compute:
- name: worker
  platform:
    openstack:
      type: m1.xlarge
  replicas: 1
controlPlane:
  name: master
  platform:
    openstack:
      type: m1.xlarge
      failureDomains:
      - portTargets:
        - id: control-plane
          network:
            id: fb6f8fea-5063-4053-81b3-6628125ed598
          fixedIPs:
          - subnet:
              id: b02175dd-95c6-4025-8ff3-6cf6797e5f86
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
      - portTargets:
        - id: control-plane
          network:
            id: 9a5452a8-41d9-474c-813f-59b6c34194b6
          fixedIPs:
          - subnet:
              id: 5fe5b54a-217c-439d-b8eb-441a03f7636d
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
      - portTargets:
        - id: control-plane
          network:
            id: 3ed980a6-6f8e-42d3-8500-15f18998c434
          fixedIPs:
          - subnet:
              id: a7d57db6-f896-475f-bdca-c3464933ec02
        computeAvailabilityZone: nova-az1
        storageAvailabilityZone: cinder-az1
  replicas: 3
metadata:
  name: mycluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 192.168.10.0/24
  - cidr: 192.168.20.0/24
  - cidr: 192.168.30.0/24
  - cidr: 192.168.72.0/24
  - cidr: 192.168.100.0/24
platform:
  openstack:
    cloud: foch_openshift
    machinesSubnet: b02175dd-95c6-4025-8ff3-6cf6797e5f86
    apiVIPs:
    - 192.168.100.240
    ingressVIPs:
    - 192.168.100.250
    loadBalancer:
      type: UserManaged
featureSet: TechPreviewNoUpgrade

Machine spec:

  Provider Spec:
    Value:
      API Version:  machine.openshift.io/v1alpha1
      Cloud Name:   openstack
      Clouds Secret:
        Name:       openstack-cloud-credentials
        Namespace:  openshift-machine-api
      Flavor:       m1.xlarge
      Image:        foch-bgp-2fnjz-rhcos
      Kind:         OpenstackProviderSpec
      Metadata:
        Creation Timestamp:  <nil>
      Networks:
        Filter:
        Subnets:
          Filter:
            Id:        5fe5b54a-217c-439d-b8eb-441a03f7636d
        Uuid:          9a5452a8-41d9-474c-813f-59b6c34194b6
      Primary Subnet:  b02175dd-95c6-4025-8ff3-6cf6797e5f86
      Security Groups:
        Filter:
        Name:  foch-bgp-2fnjz-master
        Filter:
        Uuid:             1b142123-c085-4e14-b03a-cdf5ef028d91
      Server Group Name:  foch-bgp-2fnjz-master
      Server Metadata:
        Name:                  foch-bgp-2fnjz-master
        Openshift Cluster ID:  foch-bgp-2fnjz
      Tags:
        openshiftClusterID=foch-bgp-2fnjz
      Trunk:  true
      User Data Secret:
        Name:  master-user-data
Status:
  Addresses:
    Address:  192.168.20.20
    Type:     InternalIP
    Address:  foch-bgp-2fnjz-master-1
    Type:     Hostname
    Address:  foch-bgp-2fnjz-master-1
    Type:     InternalDNS

The machine is connected to the right subnet, but has a wrong PrimarySubnet configured.

https://github.com/openshift/installer/pull/6994

Bug OCPBUGS-17284: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/48

Task OPRUN-2995: Remove cluster-policy-controller dependency from olm

View the Description View the linked PRs

The PSA changes introduced in 4.12 meant that we had to figure out a way to ensure that customer workloads (3rd-party or otherwise) wouldn't grind to a halt as pods cannot be scheduled due to PSA. The solution found was to have another controller that could introspect a namespace to determine the best pod security standard to apply to the namespace. This controller ignores payload namespaces (usually named openshift-), but will reconcile non-payload openshift- namespaces with a special label applied to it. On the OLM side, we had to create a controller that would apply the psa label sync'er label to non-payload openshift-* namespaces with operators (CSVs) installed in them.

OLM took a dependency on the cluster-policy-controller in order to get the list of payload namespaces. This dependency introduced a few challenges for our CI:

we need to ensure parity between the CPC and OLM OpenShift releases: since the list of payload namespaces could vary between OpenShift releases.
because the CPC is also a controller, it depends on many of the same libraries as OLM. This can cause vendoring problems, or force OLM to be in lockstep with CPC w.r.t. the common controller libraries

To avoid these issues, and seen as the list probably won't update very frequently, we'll make our own copy of the list and maintain it on this side, as this will be less busy work than the alternative.

https://github.com/openshift/operator-framework-olm/pull/498

Bug OCPBUGS-17367: ose-gcp-pd-csi-driver build failure

View the Description View the linked PRs

Description of problem:

ose-gcp-pd-csi-driverfails to build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=54433295

Error:
/usr/lib/golang/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
gcc: error: static: No such file or directory

make: *** [Makefile:40: gce-pd-driver] Error 1

Version-Release number of selected component (if applicable):

4.14 / master

How reproducible:

run osbs build

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/gcp-pd-csi-driver/pull/41

Bug OCPBUGS-24224: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/194

Bug OCPBUGS-19846: Unable to destroy cluster when AWS Organization SCP prevents use of iam:GetUser

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17724~~. The following is the description of the original issue:
—
Environment: OCP 4.12.24
Installation Method: IPI: Manual Mode + STS using a customer provider AWS IAM Role

I am trying to deploy an OCP4 cluster on AWS for my customer. The customer does not permit creation of IAM users so I am performing a Manual Mode with STS IPI installation instead. I have been given an IAM role to assume for the OCP installation, but unfortunately the customer's AWS Organizational Service Control Policy (SCP) does not permit the use of the iam:GetUser{} permission.

(I have informed my customer that iam:GetUser is an installation requirement - it's clearly documented in our docs, and I have raised a ticket with their internal support team requesting that their SCP is amended to include iam:getUser, however I have been informed that my request is likely to be rejected).

With this limitation understood, I still attempted to install OCP4. Surprisingly, I was able to deploy an OCP (4.12) cluster without any apparent issues, however when I tried to destroy the cluster I encountered the following error from the installer (note: fields in brackets <> have been redacted):

DEBUG search for IAM roles
DEBUG iterating over a page of 74 IAM roles
DEBUG search for IAM users
DEBUG iterating over a page of 1 IAM users
INFO get tags for <ARN of the IAM user>: AccessDenied: User:<ARN of my user> is notauthorized to perform: iam:GetUser on resource: <IAMusername> with an explicit deny in a service control policy
INFO status code: 403, request id: <request ID>
DEBUG search for IAM instance profiles
INFO error while finding resources to delete error=get tags for <ARN of IAM user> AccessDenied: User:<ARN of my user> is not authorized to perform: iam:GetUser on resource: <IAM username> with an explicit deny in a service control policy status code: 403, request id: <request ID>

Similarly, the error in AWS CloudTrail logs shows the following (note: some fields in brackets have been redacted):
User: arn:aws:sts::<AWS account no>:assumed-role/<role-name>/<user name> is not authorized to perform: iam:GetUser on resource <IAM User> with an explicit deny in a service control policy

It appears that the destroy operation is failing when the installer is trying to list tags on the only IAM user in the customer's AWS account. As discussed, the SCP does not permit the use of iam:GetUser and consequently this API call on the IAM user is denied. The installer then enters an endless loop as it continuously retries the operation. We have potentially identified the iamUserSearch function within the installer code at pkg/destroy/aws/iamhelpers.go as the area where this call is failing.

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L95

There does not appear to be a handler for "AccessDenied" API error in this function. Therefore we request that the access denied event is gracefully handled and skipped over when processing IAM users, allowing the installer to continue with the destroy operation, much in the same way that a similar access denied event is handled within the iamRoleSearch function when processing IAM roles:

See: https://github.com/openshift/installer/blob/16f19ea94ecdb056d4955f33ddacc96c57341bb2/pkg/destroy/aws/iamhelpers.go#L51

We therefore request that the following is considered and addressed:

1. Re-assess if the iam:GetUser permission is actually needed for cluster installation/cluster operations.
2. If the permission is required then the installer should provide a warning or halt the installation.
2. During a "destroy" cluster operation - the installer should gracefully handle AccessDenied errors from the API and "skip over" any IAM Users that the installer does not have permission to list tags for and then continue gracefully with the destroy operation.

https://github.com/openshift/installer/pull/7532

Story HOSTEDCP-960: Create e2e for all conditions to match their happy expectation true / false

View the Description View the linked PRs

DoD:

Go through all conditions

https://github.com/openshift/hypershift/blob/main/api/v1beta1/nodepool_conditions.go

https://github.com/openshift/hypershift/blob/main/api/v1beta1/hostedcluster_conditions.go

Add an e2e test that validate all of them match the expected state on cluster creation.

https://github.com/openshift/hypershift/pull/2482

Bug OCPBUGS-10562: Re-enable operator-uninstall.spec.ts

View the Description View the linked PRs

Description of problem:

Business Automation Operands fail to load in uninstall operator modal. With "Cannot load Operands. There was an error loading operands for this operator. Operands will need to be deleted manually..." alert message.

"Delete all operand instances for this operator__checkbox" is not shown so the test fails. 

https://search.ci.openshift.org/?search=Testing+uninstall+of+Business+Automation+Operator&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-10879: oc improperly tell user to use deprecated feature "oc adm registry"

View the Description View the linked PRs

Description of problem:

While troubleshooting a problem, oc incorrectly recommended to use a deprecated command "oc admin registry" in the output text.

Version-Release number of selected component (if applicable):

$ oc version
Client Version: 4.12.0-202302280915.p0.gb05f7d4.assembly.stream-b05f7d4
Kustomize Version: v4.5.7
Server Version: 4.12.6
Kubernetes Version: v1.25.4+18eadca

Though this is likely broken in all previous version of openshift4

How reproducible:

Only during error conditions where this error message is printed.

Steps to Reproduce:

1. have cluster without proper storage configured for the registry
2. try to build something.
3. "oc status --suggest" prints message with deprecated "oc admin registry" command.

Actual results:

$ oc status --suggest
In project pvctest on server https://api.pelauter-bm01.lab.home.arpa:6443https://my-test-pvctest.apps.pelauter-bm01.lab.home.arpa (redirects) to pod port 8080-tcp (svc/my-test)
  deployment/my-test deploys istag/my-test:latest <-
    bc/my-test source builds https://github.com/sclorg/django-ex.git on openshift/python:3.9-ubi8
      build #1 new for 3 hours (can't push to image)
    deployment #1 running for 3 hours - 0/1 podsErrors:
  * bc/my-test is pushing to istag/my-test:latest, but the administrator has not configured the integrated container image registry.

    try: oc adm registry -h
^ oc adm regisistry is deprecated in openshift4, this should guide the user to the registry operator.

Expected results:

A pointer to the proper feature to manage the registry, like the openshift registry operator.

Additional info:

I know my cluster is not set up correctly, but oc should still not give me incorrect information.
If this version of oc is expected to also work against ocp3 clusters, the fix should take this into account, where that command is still valid.

https://github.com/openshift/oc/pull/1390

Bug OCPBUGS-14082: TestNewApp unit tests in oc are failing

View the Description View the linked PRs

Description of problem:

Since the `registry.centos.org` is closed, all the unit tests in oc relying on this registry started failing.

Version-Release number of selected component (if applicable):

all versions

How reproducible:

trigger CI jobs and see unit tests are failing

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1430

Bug OCPBUGS-16292: [gcp] questions about "compute.platform.gcp.serviceAccount"

View the Description View the linked PRs

Description of problem:

The usage of "compute.platform.gcp.serviceAccount" needs to be clarified, and also the installation failure.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-16-230237

How reproducible:

Always

Steps to Reproduce:

1. "openshift-install explain installconfig.compute.platform.gcp.serviceAccount"
2. "create cluster" with an existing install-config having the field configured

Actual results:

1. It tells "The provided service account will be attached to control-plane nodes...", although the field is under compute.platform.gcp.
2. The installation failed on creating install config, with error "service accounts only valid for master nodes, provided for worker nodes".

Expected results:

1. shall "explain" command tell the field "serviceAccount" under "installconfig.compute.platform.gcp"?
2. please clarify how "compute.platform.gcp.serviceAccount" should be used

Additional info:

FYI the corresponding PR: https://github.com/openshift/installer/pull/7308

$ openshift-install version
openshift-install 4.14.0-0.nightly-2023-07-16-230237
built from commit c2d7db9d4eedf7b79fcf975f3cbd8042542982ca
release image registry.ci.openshift.org/ocp/release@sha256:e31716b6f12a81066c78362c2f36b9f18ad51c9768bdc894d596cf5b0f689681
release architecture amd64
$ openshift-install explain installconfig.compute.platform.gcp.serviceAccount
KIND:     InstallConfig
VERSION:  v1RESOURCE: <string>
  ServiceAccount is the email of a gcp service account to be used for shared vpn installations. The provided service account will be attached to control-plane nodes in order to provide the permissions required by the cloud provider in the host project.

$ openshift-install explain installconfig.controlPlane.platform.gcp.serviceAccount
KIND:     InstallConfig
VERSION:  v1RESOURCE: <string>
  ServiceAccount is the email of a gcp service account to be used for shared vpn installations. The provided service account will be attached to control-plane nodes in order to provide the permissions required by the cloud provider in the host project.

$ yq-3.3.0 r test2/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  computeSubnet: installer-shared-vpc-subnet-2
  controlPlaneSubnet: installer-shared-vpc-subnet-1
  network: installer-shared-vpc
  networkProjectID: openshift-qe-shared-vpc
$ yq-3.3.0 r test2/install-config.yaml credentialsMode
Passthrough
$ yq-3.3.0 r test2/install-config.yaml baseDomain
qe1.gcp.devcluster.openshift.com
$ yq-3.3.0 r test2/install-config.yaml metadata
creationTimestamp: null
name: jiwei-0718b
$ yq-3.3.0 r test2/install-config.yaml compute
- architecture: amd64
  hyperthreading: Enabled
  name: worker
  platform:
    gcp:
      ServiceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com
      tags:
      - preserved-ipi-xpn-compute
  replicas: 2
$ yq-3.3.0 r test2/install-config.yaml controlPlane
architecture: amd64
hyperthreading: Enabled
name: master
platform:
  gcp:
    ServiceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com
    tags:
    - preserved-ipi-xpn-control-plane
replicas: 3
$ openshift-install create cluster --dir test2
ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: compute[0].platform.gcp.serviceAccount: Invalid value: "ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com": service accounts only valid for master nodes, provided for worker nodes 
$

https://github.com/openshift/installer/pull/7347

Bug OCPBUGS-18801: 4.14 - Upgrade blocked: csi-snapshot-controller fails with read-only filesystem

View the Description View the linked PRs

Description of problem:

OCP upgrade blocks because of cluster operator csi-snapshot-controller fails to start its deployment with a fatal message of read-only filesystem

Version-Release number of selected component (if applicable):

Red Hat OpenShift 4.11
rhacs-operator.v3.72.1

How reproducible:

At least once in user's cluster while upgrading

Steps to Reproduce:

1. Have a OCP 4.11 installed
2. Install ACS on top of the OCP cluster
3. Upgrade OCP to the next z-stream version

Actual results:

Upgrade gets blocked: waiting on csi-snapshot-controller

Expected results:

Upgrade should succeed

Additional info:

stackrox SCCs (stackrox-admission-control, stackrox-collector and stackrox-sensor) contain the `readOnlyRootFilesystem` set to `true`, if not explicitly defined/requested, other Pods might receive this SCC which will make the deployment to fail with a `read-only filesystem` message

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/162

Bug OCPBUGS-19357: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3920

Bug OCPBUGS-7415: oauth user:check-access scoped tokens can not be used to check access as intended

View the Description View the linked PRs

Description of problem:

oauth user:check-access scoped tokens can not be used to check access as intended.  SelfSubjectAccessReviews from such scoped token always report allowed: false, denied: true.  Unless the SelfSubjectAccessReview is checking access for ability to create SelfSubjectAccessReviews.  This does not seem like the intended behavior per documentation.

https://docs.openshift.com/container-platform/4.12/authentication/tokens-scoping.html

oauth user:check-access scoped tokens only have authorization for SelfSubjectAccessReview.  This is as intended.  This seems to be limited by the scopeauthorizor.  However, the authorizor used by SelfSubjectAccessReview includes this filter, meaning the returned response is useless (you can only check-access to SelfSubjectAccessReview itself instead of using the token to check access of RBAC of the parent user the token is scoped from).

https://github.com/openshift/kubernetes/blob/master/openshift-kube-apiserver/authorization/scopeauthorizer/authorizer.go

https://github.com/openshift/kubernetes/blob/master/pkg/registry/authorization/selfsubjectaccessreview/rest.go

Version-Release number of selected component (if applicable):

How reproducible:

Create user:check-access scoped token.  Token must not have user:full scope.  Use the token to do a SelfSubjectAccessReview.

Steps to Reproduce:

1. Create user:check-access scoped token.  Must not have user:full scope.
2. Use the token to do a SelfSubjectAccessReview against a resource the parent user has access to.
3. Observe the status response is allowed: false, denied: true.

Actual results:

Unable to check user access with a user:check-access scoped token.

Expected results:

Ability to check user access with a user:check-access scoped token, without user:full scope which would give the token full access and abilities of the parent user.

Additional info:

https://github.com/openshift/kubernetes/pull/1493

Bug OCPBUGS-11382: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12655

Bug OCPBUGS-12640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/67

Bug OCPBUGS-8271: unusual error log in cluster-policy-controller

View the Description View the linked PRs

Description of problem:

The kube-controller-manager container cluster-policy-controller will show unusual error logs ,such as "
I0214 10:49:34.698154       1 interface.go:71] Couldn't find informer for template.openshift.io/v1, Resource=templateinstances
I0214 10:49:34.698159       1 resource_quota_monitor.go:185] QuotaMonitor unable to use a shared informer for resource "template.openshift.io/v1, Resource=templateinstances": no informer found for template.openshift.io/v1, Resource=templateinstances
"

Version-Release number of selected component (if applicable):

How reproducible:

when the cluster-policy-controller restart ,u will see these logs

Steps to Reproduce:

1.oc logs kube-controller-manager-master0 -n openshift-kube-controller-manager -c cluster-policy-controller

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-policy-controller/pull/100

Bug MGMT-13685: [PSI] API and Ingress VIPs fields should not allow broadcast IPs

View the Description View the linked PRs

Description of the problem:

BE 2.15.x, API and Ingress VIPs values doesn't have validation for broadcast IPs (i.e. if network is 192.168.123.0/24 --> 192.168.123.0 and 192.168.123.255).

How reproducible:

100%

Steps to reproduce:

1. Create cluster with Ingress or API vip with broadcast IP

2.

3.

Actual results:

Expected results:
BE should block those IPs

https://github.com/openshift/assisted-service/pull/5256

Bug OCPBUGS-15997: Machine with AZ and rootVolume but no volume AZ can't be created if Cinder AZs != Nova AZs

View the Description View the linked PRs

Description of problem:

When a machine is created with a compute availability zone (defined via mpool.zones) and a storage root volume (defined as mpool.rootVolume) and that rootVolume has no specified zones, CAPO will use the compute AZ for the volume AZ.

This can be problematic if the AZ doesn't exist in Cinder.
Source:

https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/9d183bd479fe9aed4f6e7ac3d5eee46681c518e7/pkg/cloud/services/compute/instance.go#L439-L442

Version-Release number of selected component (if applicable):

All versions supporting rootVolume AZ.

Steps to Reproduce:

1. In install-config.yaml, add "zones" with valid Nova AZs, and a rootVolume without "zones". Your OpenStack cloud must not have Cinder AZs (only Nova AZs)
2. Day 1 deployment will go fine, Terraform will create the machines with no AZ.
3. Day 2 operation on machines will fail since CAPO tries to use the Nova AZ for the root volume if no volume AZ is provided, but since the AZ don't match between Cinder & Nova, the machine will never be created

Actual results:

Machine not created

Expected results:

Machine created in the right AZ for both Nova & Cinder

https://github.com/openshift/installer/pull/7309

Bug OCPBUGS-7794: image pull secret creation form doesn't re-enable Create button once it is disabled

View the Description View the linked PRs

Description of problem:

'Create' button on image pull secret creation form can not be re-enabled if it is disabled once

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-17-090603

How reproducible:

Always

Steps to Reproduce:

1. user logins to console
2. goes to Secrets -> Create Image pull secret, on the page
- Secret name: test-secret
- Authentication type: Upload configuration file, here we upload invalid JSON format file, console will give warning message 'Configuration file should be in JSON format.' and 'Create' button will be disabled
3. then we change Authentication type to 'Image registry credentials', fill up every required fields: Registry server address, Username and Password, we can see 'Create' button is still disabled

Actual results:

3. 'Create' button is still disabled, user has to cancel and fill the form again

Expected results:

3. we should re-enable Create button since we are trying to filling a form in a different way with all required fields correctly configured

Additional info:

https://github.com/openshift/console/pull/12609

Bug OCPBUGS-14403: IngressVIP getting attach to two nodes at once

View the Description View the linked PRs

Description of problem:

IngressVIP is getting attached to two node at once.

Version-Release number of selected component (if applicable):

4.11.39

How reproducible:

Always in customer cluster

Actual results:

IngressVIP is getting attached to two node at once.

Expected results:

IngressVIP should get attach to only one node.

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/257

Bug OCPBUGS-14428: [HyperShift] Hypershift lanes are failing in CI due to Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules)

View the Description View the linked PRs

Description of problem:

We are seeing flakes in HyperShift CI jobs: https://search.ci.openshift.org/?search=Alerting+rule+%22CsvAbnormalFailedOver2Min%22&maxAge=48h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Sample failure: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_ovn-kubernetes/1692/pull-ci-openshift-ovn-kubernetes-master-e2e-aws-ovn-hypershift/1664244482360479744

{  fail [github.com/openshift/origin/test/extended/prometheus/prometheus.go:148]: Incompliant rules detected:

Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules) has no 'description' annotation, but has a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation
Alerting rule "CsvAbnormalFailedOver2Min" (group: olm.csv_abnormal.rules) has no 'summary' annotation
Alerting rule "CsvAbnormalOver30Min" (group: olm.csv_abnormal.rules) has no 'description' annotation, but has a 'message' annotation. OpenShift alerts must use 'description' -- consider renaming the annotation
Alerting rule "CsvAbnormalOver30Min" (group: olm.csv_abnormal.rules) has no 'summary' annotation
Ginkgo exit error 1: exit with code 1}

Version-Release number of selected component (if applicable):

4.14 CI

How reproducible:

sometimes

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/2636

Bug OCPBUGS-18097: OCP 4.14 increased rate of patch nodes requests from node SAs

View the Description View the linked PRs

Description of problem:

With 120+ node clusters, we are seeing O(10) larger rate of patch node requests coming from node service accounts.  These higher rate of updates are causing issues where "nodes" watchers are being terminated, causing storm of watch requests that increases CPU load on the cluster.

What I see is node resourceVersions are incremented rapidly and in large bursts and watchers are terminated as a result.

Version-Release number of selected component (if applicable):

4.14.0-ec.4
4.14.0-0.nightly-2023-08-08-222204
4.13.0-0.nightly-2023-08-10-021434

How reproducible:

Repeatable

Steps to Reproduce:

1. Create 4.14 cluster with 120 nodes with m5.8xlarge control plane and c5.4xlarge workers.
2. Run `oc get nodes -w -o custom-columns='NAME:.metadata.name,RV:.metadata.resourceVersion' ` 
3. Wait for a big chunk of nodes to be updated and observe the watch terminate.
4. Optionally run `kube-burner ocp node-density-cni --pods-per-node=100` to generate some load.

Actual results:

kube-apiserver audit events show >1500 node patch requests from a single node SA in a certain amount of time:
   1678 ["system:node:ip-10-0-69-142.us-west-2.compute.internal",null]
   1679 ["system:node:ip-10-0-33-131.us-west-2.compute.internal",null]
   1709 ["system:node:ip-10-0-41-44.us-west-2.compute.internal",null]

Observe that apiserver_terminated_watchers_total{resource="nodes"} starts to increment before 120 node scaleup is even complete.

Expected results:

patch requests in certain amount of time are more aligned with what we see on 4.13*08-10* nightly:
     57 ["system:node:ip-10-0-247-122.us-west-2.compute.internal",null]
     62 ["system:node:ip-10-0-239-217.us-west-2.compute.internal",null]
     63 ["system:node:ip-10-0-165-255.us-west-2.compute.internal",null]
     64 ["system:node:ip-10-0-136-122.us-west-2.compute.internal",null]

Observe that apiserver_terminated_watchers_total{resource="nodes"} does not increment.

Observe that rate of mutating node requests levels off after nodes are created.

Additional info:

Suspecting these updates coming from nodes could be part of a response to the MCO controllerconfigs resource being updated every few minutes or more frequently.

One of the suspected causes of increased kube-apiserer CPU usage investigation of ovn-ic.

https://github.com/openshift/machine-config-operator/pull/3891

Bug OCPBUGS-20701: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver/pull/54

Bug OCPBUGS-17907: Revert adding the "gather_sno" script to MG

View the Description View the linked PRs

Description of problem:

accidentally merged before fully reviewed

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/must-gather/pull/376

Bug OCPBUGS-18883: CPMS failure domains should be omitted when a single failure domain is present

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18113~~. The following is the description of the original issue:
—
Description of problem:

When the installer generates a CPMS, it should only add the `failureDomains` field when there is more than one failure domain. When there is only one failure domain, the fields from the failure domain, eg the zone, should be injected directly into the provider spec and the failure domain should be omitted.

By doing this, we avoid having to care about failure domain injection logic for single zone clusters. Potentially avoiding bugs (such as some we have seen recently).

IIRC we already did this for OpenStack, but AWS, Azure and GCP may not be affected.

Version-Release number of selected component (if applicable):

How reproducible:

Can be demonstrated on Azure on the westus region which has no AZs available. Currently the installer creates the following, which we can omit entirely:
```
failureDomains:
  platform: Azure
  azure:
  - zone: ""
```

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7483

Story CONSOLE-3623: Enable getting X-CSRF token for plugins

View the Description View the linked PRs

X-CSRF token is currently added automatically for any request using `coFetch` functions. In some cases, plugins would like to use their own functions/libs like axios. Console should enable retrieving the X-CSRF token

Acceptance Criteria:

Dynamic plugin can retrieve X-CSRF token via their own functions (axios)

https://github.com/openshift/console/pull/12719

Bug OCPBUGS-11280: Upgrade SNO: no resolv.conf caused by failure in forcedns dispatcher script

View the Description View the linked PRs

Description of problem:

There is forcedns dispatcher script added by assisted installed installation process that create etc/resolv.conf

This script has no shebang that caused installation to fail as no resolv.conf was generated.

I order to fix upgrades in already installed clusters we need to workaround this issue.

Version-Release number of selected component (if applicable):

4.13.0

How reproducible:

Happens every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3648

Bug OCPBUGS-14002: showTooltips check box is not aligned with other icons

View the Description View the linked PRs

Description of problem:

'Show tooltips' toggle is added on resource YAML page, but the checkbox icon seems not aligned with other icons

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-05-23-103225

How reproducible:

Always

Steps to Reproduce:

1. goes to any resource YAML page, check 'Show tooltips' icon position
2.
3.

Actual results:

1. the checkbox is a little above other icons, see screenshot https://drive.google.com/file/d/10wKeRaaE76GBXBph93wAkFCWYGrEKcA9/view?usp=share_link

Expected results:

1. all icons should be aligned

Additional info:

https://github.com/openshift/console/pull/12894

Bug OCPBUGS-16482: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-installer/pull/702

Task MGMT-15126: Add missing incompatible features in some of the feature-support feature

View the linked PRs

https://github.com/openshift/assisted-service/pull/5327

Bug OCPBUGS-12767: Installation failed with setting: featureSet: LatencySensitive or featureSet: CustomNoUpgrade

View the Description View the linked PRs

Description of problem:

Installation failed when setting featureSet: LatencySensitive or featureSet: CustomNoUpgrade.
When setting featureSet: CustomNoUpgrade in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/spf13/cobra@v1.6.0/command.go:968
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.run(0xc00025c300)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:146 +0x317
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: k8s.io/component-base/cli.Run(0x2ce59e8?)
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         k8s.io/component-base@v0.26.1/cli/run.go:46 +0x1d
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]: main.main()
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670367]:         github.com/openshift/cluster-kube-controller-manager-operator/cmd/cluster-kube-controller-manager-operator/main.go:24 +0x2c
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Failed with result 'exit-code'.
Apr 26 07:02:48 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 343.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Stopped Bootstrap a Kubernetes cluster.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: bootkube.service: Consumed 1.935s CPU time.
Apr 26 07:02:54 bootstrap.wwei-426g.qe.devcluster.openshift.com systemd[1]: Started Bootstrap a Kubernetes cluster.
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[670489]: Rendering Kubernetes Controller Manager core manifests...
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: panic: interface conversion: interface {} is nil, not []interface {}
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: goroutine 1 [running]:
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller.GetKubeControllerManagerArgs(0xc000746100?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/operator/targetconfigcontroller/targetconfigcontroller.go:696 +0x379
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.(*renderOpts).Run(0xc0008d22c0)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:269 +0x85c
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1.1(0x0?)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:48 +0x32
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render.NewRenderCommand.func1(0xc000bee600?, {0x285dffa?, 0x8?, 0x8?})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/openshift/cluster-kube-controller-manager-operator/pkg/cmd/render/render.go:58 +0xc8
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).execute(0xc000bee600, {0xc00071cb00, 0x8, 0x8})
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:920 +0x847
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).ExecuteC(0xc000bee000)
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]:         github.com/spf13/cobra@v1.6.0/command.go:1040 +0x3bd
Apr 26 07:02:56 bootstrap.wwei-426g.qe.devcluster.openshift.com bootkube.sh[672314]: github.com/spf13/cobra.(*Command).Execute(...)


When setting featureSet: LatencySensitive in install-config and create cluster.See below error info:
[core@bootstrap ~]$ journalctl -b -f -u release-image.service -u bootkube.service
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: [#1105] failed to create some manifests:
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: "cluster-infrastructure-02-config.yml": failed to create infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource
Apr 26 07:07:09 bootstrap.wwei-426h.qe.devcluster.openshift.com bootkube.sh[16835]: Failed to create "cluster-infrastructure-02-config.yml" infrastructures.v1.config.openshift.io/cluster -n : the server could not find the requested resource

Version-Release number of selected component (if applicable):

OCP version: 4.13.0-0.nightly-2023-04-21-084440

How reproducible:

always

Steps to Reproduce:

1.Create install-config.yaml like below(LatencySensitive)
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: LatencySensitive
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

2.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

3.Create install-config.yaml like below(CustomNoUpgrade):
  apiVersion: v1
  controlPlane:
    architecture: amd64
    hyperthreading: Enabled
    name: master
    replicas: 3
  compute:
  - architecture: amd64
    hyperthreading: Enabled
    name: worker
   replicas: 2
  metadata:
    name: wwei-426h
  platform:
   none: {}
  pullSecret: xxxxx
  featureSet: CustomNoUpgrade
  networking:
    clusterNetwork:
    - cidr: xxxxx
      hostPrefix: 23
    serviceNetwork:
    - xxxxx
    networkType: OpenShiftSDN
  publish: External
  baseDomain: xxxxxx
  sshKey: xxxxxxx

4.Then continue to install the cluster:
openshift-install create cluster --dir <install_folder> --log-level debug

Actual results:

Installation failed.

Expected results:

Installation succeeded.

Additional info:

log-bundle can get from below link : https://drive.google.com/drive/folders/1kg1EeYR6ApWXbeRZTiM4DV205nwMfSQv?usp=sharing

https://github.com/openshift/cluster-config-operator/pull/320

Bug OCPBUGS-14548: The ExternalLink for ' OpenShift Pipelines based on Tekton' is incorrect

View the Description View the linked PRs

Description of problem:

The ExternalLink 'OpenShift Pipelines based on Tekton' in Pipeline Build Strategy deprecation Alert is incorrect, currently it's defined as https://openshift.github.io/pipelines-docs/ and will redirect to a 'Not found' page

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-04-133505

How reproducible:

Always

Steps to Reproduce:

1. $oc new app -n test https://github.com/openshift/cucushift/blob/master/testdata/pipeline/samplepipeline.yaml
  
   OR Create Jenkins server and Pipeline BC
   $ oc new-app https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/jenkins-ephemeral-template.json
   $ oc new-app -f https://raw.githubusercontent.com/openshift/origin/master/examples/jenkins/pipeline/samplepipeline.yaml

2. Admin user login console and navigate to Builds -> Build Configs -> sample-pipeline Details page
3.Check the External link 'OpenShift Pipelines based on Tekton' in the 'Pipeline build strategy deprecation' Alert

Actual results:

Now a 'Not found' page would be redirected for the user

Expected results:

The link should be correct and existing

Additional info:

Impact file build.tsx
https://github.com/openshift/console/blob/a0e7e98e5ffe4aca73f9f1f441d15cc4e9b33ee6/frontend/public/components/build.tsx#LL238C17-L238C60

Base bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1768350

Bug OCPBUGS-20726: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/builder/pull/362

Bug OCPBUGS-14638: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/294

Bug OCPBUGS-16265: Cluster operator storage Degraded is True with PowerVSBlockCSIDriverOperatorCR_PowerVSBlockCSIDriverStaticResources

View the Description View the linked PRs

Description of problem:

Getting below error while creating cluster in mon01 zone
Joblink: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-multiarch-master-nightly-4.14-ocp-e2e-ovn-ppc64le-powervs/1680759459892170752
Error:
level=info msg=Cluster operator insights SCAAvailable is False with Forbidden: Failed to pull SCA certs from https://api.openshift.com/api/accounts_mgmt/v1/certificates: OCM API https://api.openshift.com/api/accounts_mgmt/v1/certificates returned HTTP 403: {"code":"ACCT-MGMT-11","href":"/api/accounts_mgmt/v1/errors/11","id":"11","kind":"Error","operation_id":"c3773b1e-8818-4bfc-9605-dbd9dbc0c03f","reason":"Account with ID 2DUeKzzTD9ngfsQ6YgkzdJn1jA4 denied access to perform create on Certificate with HTTP call POST /api/accounts_mgmt/v1/certificates"}
level=info msg=Cluster operator network ManagementStateDegraded is False with : 
level=error msg=Cluster operator storage Degraded is True with PowerVSBlockCSIDriverOperatorCR_PowerVSBlockCSIDriverStaticResourcesController_SyncError: PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_attacher_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-attacher-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_provisioner_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-provisioner-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/volumesnapshot_reader_provisioner_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-provisioner-volumesnapshot-reader-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/main_resizer_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-main-resizer-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded: "rbac/storageclass_reader_resizer_binding.yaml" (string): clusterroles.rbac.authorization.k8s.io "openshift-csi-resizer-storageclass-reader-role" not found
level=error msg=PowerVSBlockCSIDriverOperatorCRDegraded: PowerVSBlockCSIDriverStaticResourcesControllerDegraded:

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Steps to Reproduce:

1.
2.
3.

Expected results:

cluster creation should be successful

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/386

Bug OCPBUGS-21774: Fix for OCP-11594 in upstream to skip on disconnected env

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20337~~. The following is the description of the original issue:
—
Description of problem:

Fixing PR https://github.com/openshift/origin/pull/28295 to skip on disconnected cluster

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-metal-ipi-ovn-ipv6/1711658474384920576

Version-Release number of selected component (if applicable):

4.15

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-7632: [CI Watcher] Testing uninstall of Business Automation Operator "attempts to uninstall the Operator and delete all Operand Instances, shows 'Error Deleting Operands' alert"

View the Description View the linked PRs

Description of problem:

Business Automation Operands fail to load in uninstall operator modal. With "Cannot load Operands. There was an error loading operands for this operator. Operands will need to be deleted manually..." alert message.

"Delete all operand instances for this operator__checkbox" is not shown so the test fails. 

https://search.ci.openshift.org/?search=Testing+uninstall+of+Business+Automation+Operator&maxAge=168h&context=1&type=junit&name=pull-ci-openshift-console-master-e2e-gcp-console&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12647

Bug OCPBUGS-12538: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/101

Bug OCPBUGS-16684: CR.status.provisioned set to true for ignored CRs

View the Description View the linked PRs

Description of problem:

In an STS cluster with the TechPreviewNoUpgrade featureset enabled, CCO ignores CRs whose .spec.providerSpec.stsIAMRoleARN is unset. 

While the CR controller does not provision a Secret for the aforementioned type of CRs, it still sets .status.provisioned to true for them.

Steps to Reproduce:

1. Create an STS cluster, enable the feature set. 

2. Create a dummy CR like the following:
fxie-mac:cloud-credential-operator fxie$ cat cr2.yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  name: test-cr-2
  namespace: openshift-cloud-credential-operator
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
  secretRef:
    name: test-secret-2
    namespace: default
  serviceAccountNames:
  - default

3. Check CR.status
fxie-mac:cloud-credential-operator fxie$ oc get credentialsrequest test-cr-2 -n openshift-cloud-credential-operator -o yaml
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
  creationTimestamp: "2023-07-24T09:21:44Z"
  finalizers:
  - cloudcredential.openshift.io/deprovision
  generation: 1
  name: test-cr-2
  namespace: openshift-cloud-credential-operator
  resourceVersion: "180154"
  uid: 34b36cac-3fca-4fa5-a003-a9b64c5fbf00
spec:
  providerSpec:
    apiVersion: cloudcredential.openshift.io/v1
    kind: AWSProviderSpec
    statementEntries:
    - action:
      - ec2:CreateTags
      effect: Allow
      resource: '*'
  secretRef:
    name: test-secret-2
    namespace: default
  serviceAccountNames:
  - default
status:
  lastSyncGeneration: 0
  lastSyncTimestamp: "2023-07-24T09:39:40Z"
  provisioned: true

https://github.com/openshift/cloud-credential-operator/pull/583

Bug OCPBUGS-18149: kube-apiserver etcd storage retry broken on slow platforms

View the Description View the linked PRs

"etcdserver: leader changed" causes clients to fail.

This error should never bubble up to clients because the kube-apiserver can always retry this failure mode since it knows the data was not modified. When etcd adjusts timeouts for leader election and heartbeating for slow hardware like Azure, the hardcoded timeouts in the kube-apiserver/etcd fail. See

kube-apiserver tries to use etcd retries: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/storage/storagebackend/factory/etcd3.go#L308-L317
etcd retries appear to be unconditionally added: https://github.com/etcd-io/etcd/blob/main/client/v3/client.go#L243-L249 and https://github.com/etcd-io/etcd/blob/release-3.5/client/v3/client.go#L286
etcd retries retry a max of 2.5 seconds: https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L53 + https://github.com/etcd-io/etcd/blob/main/client/v3/options.go#L45
etcd retries are further reduced by zero-second retry on quorum
On azure https://github.com/openshift/cluster-etcd-operator/blob/d7d43ee21aff6b178b2104228bba374977777a84/pkg/etcdenvvar/etcd_env.go#L229 slower leader change reactions https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/hwspeedhelpers/hwhelper.go#L28 mean we are likely to exceed the number of retries for requests near the beginning of a change

Simply saying, "oh, it's hardcoded and kube" isn't good enough. We have previously had a storage shim to retry such problems. If all else fails, bringing back the small shim to retry Unavailable etcd errors longer is appropriate to fix all available clients.

Additionally, this etcd capability is being made more widely available and this bug prevents that from working.

Bug OCPBUGS-26207: CVO should continue to periodically fetch upstream Cincinnati despite Recommended=Unknown risks

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25949~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25708. The following is the description of the original issue:
—

Description of problem:

Changes made for faster risk cache-warming (the ~~OCPBUGS-19512~~ series) introduced an unfortunate cycle:

1. Cincinnati serves vulnerable PromQL, like graph-data#4524.
2. Clusters pick up that broken PromQL, try to evaluate, and fail. Re-eval-and-fail loop continues.
3. Cincinnati PromQL fixed, like graph-data#4528.
4. Cases:

- (a) Before the cache-warming changes, and also after this bug's fix, Clusters pick up the fixed PromQL, try to evaluate, and start succeeding. Hooray!
- (b) Clusters with the cache-warming changes but without this bug's fix say "it's been a long time since we pulled fresh Cincinanti information, but it has not been long since my last attempt to eval this broken PromQL, so let me skip the Cincinnati pull and re-eval that old PromQL", which fails. Re-eval-and-fail loop continues.

Version-Release number of selected component (if applicable):

The regression went back via:

Updates from those releases (and later in their 4.y, until this bug lands a fix) to later releases are exposed.

How reproducible:

Likely very reproducible for exposed releases, but only when clusters are served PromQL risks that will consistently fail evaluation.

Steps to Reproduce:

1. Launch a cluster.
2. Point it at dummy Cincinnati data, as described in ~~OTA-520~~. Initially declare a risk with broken PromQL in that data, like cluster_operator_conditions.
3. Wait until the cluster is reporting Recommended=Unknown for those risks (oc adm upgrade --include-not-recommended).
4. Update the risk to working PromQL, like group(cluster_operator_conditions). Alternatively, update anything about the update-service data (e.g. adding a new update target with a path from the cluster's version).
5. Wait 10 minutes for the CVO to have plenty of time to pull that new Cincinnati data.
6. oc get -o json clusterversion version | jq '.status.conditionalUpdates[].risks[].matchingRules[].promql.promql' | sort | uniq | jq -r .

Actual results:

Exposed releases will still have the broken PromQL in their output (or will lack the new update target you added, or whatever the Cincinnati data change was).

Expected results:

Fixed releases will have picked up the fixed PromQL in their output (or will have the new update target you added, or whatever the Cincinnati data change was).

Additional info:

Identification

To detect exposure in collected Insights, look for EvaluationFailed conditionalUpdates like:

$ oc get -o json clusterversion version | jq -r '.status.conditionalUpdates[].conditions[] | select(.type == "Recommended" and .status == "Unknown" and .reason == "EvaluationFailed" and (.message | contains("invalid PromQL")))'
{
  "lastTransitionTime": "2023-12-15T22:00:45Z",
  "message": "Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34\nAdding a new worker node will fail for clusters running on ARO. https://issues.redhat.com/browse/MCO-958",
  "reason": "EvaluationFailed",
  "status": "Unknown",
  "type": "Recommended"
}

To confirm in-cluster vs. other EvaluationFailed invalid PromQL issues, you can look for Cincinnati retrieval attempts in CVO logs. Example from a healthy cluster:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:36:39.783530       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:36:39.831358       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:40:19.674925       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:40:19.727998       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:43:59.567369       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:43:59.620315       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:47:39.457582       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:47:39.509505       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"
I1221 20:51:19.348286       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://api.openshift.com/api/upgrades_info/v1/graph?...
I1221 20:51:19.401496       1 promql.go:118] evaluate PromQL cluster condition: "(\n  group(cluster_operator_conditions{name=\"aro\"})\n  or\n  0 * group(cluster_operator_conditions)\n)\n"

showing fetch lines every few minutes. And from an exposed cluster, only showing PromQL eval lines:

$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from\|PromQL' | tail
I1221 20:50:10.165101       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:11.166170       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:12.166314       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:13.166517       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:14.166847       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:15.167737       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:16.168486       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:17.169417       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:18.169576       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
I1221 20:50:19.170544       1 availableupdates.go:123] Requeue available-update evaluation, because "4.13.25" is Recommended=Unknown: EvaluationFailed: Exposure to AROBrokenDNSMasq is unknown due to an evaluation failure: invalid PromQL result length must be one, but is 34
$ oc -n openshift-cluster-version logs -l k8s-app=cluster-version-operator --tail -1 --since 30m | grep 'request updates from' | tail
...no hits...

Recovery

If bitten, the remediation is to address the invalid PromQ. For example, we fixed that AROBrokenDNSMasq expression in graph-data#4528. And after that the local cluster administrator should restart their CVO, such as with:

$ oc -n openshift-cluster-version delete -l k8s-app=cluster-version-operator pods

https://github.com/openshift/cluster-version-operator/pull/1016

Bug OCPBUGS-10139: Update 4.14 thanos image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/thanos/pull/104

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/thanos/pull/104

Bug OCPBUGS-21134: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/530

Bug OCPBUGS-15060: "Duplicate RoleBinding" leads to "Unsupported value" error

View the Description View the linked PRs

Description of problem:

When clicking on "Duplicate RoleBinding" in the OpenShift Container Platform Web Console, users are taken to a form where they can review the duplicated RoleBinding.

When the RoleBinding has a ServiceAccount as a subject, clicking "Create" leads to the following error:

An error occurred
Error "Unsupported value: "rbac.authorization.k8s.io": supported values: """ for field "subjects[0].apiGroup".

The root cause seems to be that the field "subjects[0].apiGroup" is set to "rbac.authorization.k8s.io" even for "kind: ServiceAccount" subjects. For "kind: ServiceAccount" subjects, this field is not necessary but the "namespace" field should be set instead.

The functionality works as expected for User and Group subjects.

Version-Release number of selected component (if applicable):

OpenShift Container Platform 4.12.19

How reproducible:

Always

Steps to Reproduce:

1. In the OpenShift Container Platform Web Console, click on "User Management" => "Role Bindings"
2. Search for a RoleBinding that has a "ServiceAccount" as the subject. On the far right, click on the dots and choose "Duplicate RoleBinding"
3. Review the fields, set a new name for the duplicated RoleBinding, click "Create"

Actual results:

Duplicating fails with the following error message being shown:

An error occurred
Error "Unsupported value: "rbac.authorization.k8s.io": supported values: """ for field "subjects[0].apiGroup".

Expected results:

RoleBinding is duplicated without an error message

Additional info:

Reproduced with OpenShift Container Platform 4.12.18 and 4.12.19

https://github.com/openshift/console/pull/12921

Bug OCPBUGS-21176: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/124

Bug OCPBUGS-27022: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-kube-scheduler-operator/pull/527

Bug OCPBUGS-13111: Error logs related to NTO Service during HostedCluster creation

View the Description View the linked PRs

Description of problem:
During the creation of a new HostedCluster, the control-plane-operator reports several lines of logs like

{"level":"error","ts":"2023-05-04T05:24:03Z","msg":"failed to remove service ca annotation and secret: %w","controller":"hostedcontrolplane","controllerGroup":"hypershift.openshift.io","controllerKind":"HostedControlPlane","hostedControlPlane":{"name":"demo-02","namespace":"clusters-demo-02"},"namespace":"clusters-demo-02","name":"demo-02","reconcileID":"5ffe0a7f-94ce-4745-b89d-4d5168cabe8d","error":"failed to get service: Service \"node-tuning-operator\" not found","stacktrace":"github.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).reconcile\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:929\ngithub.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).update\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:830\ngithub.com/openshift/hypershift/control-plane-operator/controllers/hostedcontrolplane.(*HostedControlPlaneReconciler).Reconcile\n\t/hypershift/control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:677\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:121\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:320\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:273\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/hypershift/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:234"}

Until the Service / Secret are created.

Version-Release number of selected component (if applicable):

Management cluster: 4.14.0-nightly
Hosted Cluster: 4.13.0 or 4.14.0-nightly

How reproducible:

Always

Steps to Reproduce:

1. Create a hosted cluster

Actual results:

HostedCluster is created but there are several unnecessary "error" logs in the control-plane-operator

Expected results:

No error logs from control-plane-operator/controllers/hostedcontrolplane/hostedcontrolplane_controller.go:removeServiceCAAnnotationAndSecret() during normal cluster creation

Additional info:

https://github.com/openshift/hypershift/pull/2513

Bug OCPBUGS-13332: add checkpoint for name when create catalogsouce

View the Description View the linked PRs

Description of problem:

when catalogsouce name started with number , the pod will not running well , could we add checkpoint for the name , if the name is not suitable for regex used validation  ''[a-z]([-a-z0-9]*[a-z0-9])?'')',  print message and can't create the catalogsource .

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1.skopeo copy --all --format v2s2 docker://icr.io/cpopen/ibm-zcon-zosconnect-catalog@sha256:6f02ecef46020bcd21bdd24a01f435023d5fc3943972ef0d9769d5276e178e76 oci:///home1/611/oci-index
2. change the work directory to :  `cd  home1/611/oci-index` 
3. run the oc-mirror command : 
cat config.yaml 
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
storageConfig:
  local:
    path: /home1/ocilocalstorage
mirror:
  operators:
  - catalog: oci:///home1/611/oci-index

`oc-mirror --config config.yaml docker://ec2-18-217-58-249.us-east-2.compute.amazonaws.com:5000/multi-oci --dest-skip-tls --include-local-oci-catalogs`
4. apply the catalogsouce and ICSP yaml file;
5 . check the catalogsource pod

Actual results:

[root@preserve-fedora36 oci-index]# oc get pod --show-labels 
NAME                                    READY   STATUS              RESTARTS   AGE     LABELS
611-oci-index-2sfh8                     0/1     Terminating         0          4s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-dbj9b                     0/1     ContainerCreating   0          1s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-w4tfd                     0/1     Terminating         0          2s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87
611-oci-index-zj8nn                     0/1     Terminating         0          3s      olm.catalogSource=611-oci-index,olm.pod-spec-hash=6b8656f87

oc get catalogsource 611-oci-index -oyaml 
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  creationTimestamp: "2023-05-10T03:01:36Z"
  generation: 1
  name: 611-oci-index
  namespace: openshift-marketplace
  resourceVersion: "97108"
  uid: 2287434b-9e70-4865-b1a1-95997165f94e
spec:
  image: ec2-18-217-58-249.us-east-2.compute.amazonaws.com:5000/multi-oci/home1/611/oci-index:6f02ec
  sourceType: grpc
status:
  message: 'couldn''t ensure registry server - error ensuring service: 611-oci-index:
    Service "611-oci-index" is invalid: metadata.name: Invalid value: "611-oci-index":
    a DNS-1035 label must consist of lower case alphanumeric characters or ''-'',
    start with an alphabetic character, and end with an alphanumeric character (e.g.
    ''my-name'',  or ''abc-123'', regex used for validation is ''[a-z]([-a-z0-9]*[a-z0-9])?'')'
  reason: RegistryServerError

Expected results:

should not create the catalogsouce when it's name is not suitable for the regex used validation  .

Additional info:
rename the catalogsource with oci-611-index, pod running well, and could create the operator and instance .

https://github.com/openshift/oc-mirror/pull/636

Bug OCPBUGS-14132: SCOS times out during provisioning of BM nodes

View the Description View the linked PRs

Description of problem:

SCOS times out during provisioning of BM nodes

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/ironic-image/pull/377

https://github.com/openshift/ironic-image/pull/377

Task MGMT-11424: Assisted-service should validate the content of ignition endpoint URL and CA cert

View the Description View the linked PRs

In case of user provides partial/empty/invalid ca certificate in the ignition endpoint override the ignitionDownloadable/API_VIP validation will fail but the user will not know why.
In the agent log we will see this error:

Failed to download worker.ign: unable to parse cert

One option to let the user know about the problem is to return the error in case of failure as part of the APIVipConnectivityResponse and present it to the user.
and use that value as part of the failing validation message.
This is a bit tricky, the current error message are not user facing and we will need to adjust them.
It also requires API changes...
Another option is to validate the parameters the user provides

https://github.com/openshift/assisted-service/pull/5145

Bug OCPBUGS-13871: Should update with --include-local-oci-catalogs for --oci-registries-config options

View the Description View the linked PRs

Should update with --include-local-oci-catalogs for --oci-registries-config's help info 


      --oci-registries-config string    Registries config file location (used only with --use-oci-feature flag)
Now the `--use-oci-feature` has been deprecated, please replace with --include-local-oci-catalogs for the help information.

https://github.com/openshift/oc-mirror/pull/653

Bug OCPBUGS-13970: SecretHashAnnotation is not always removed from oauthDeployment when idp is defined

View the Description View the linked PRs

if the kubeadmin secret was deleted successfully from the guest cluster, but the `SecretHashAnnotation` annotation deletion on the oauthDeployment failed, the annotation will not be reconciled again and the annotation will never be removed.

context: https://redhat-internal.slack.com/archives/C01C8502FMM/p1684765042825929

https://github.com/openshift/hypershift/pull/2593

Bug OCPBUGS-17677: [Azure]CNCC failed to assign egressIP to NIC for Azure Workload Identity Cluster

View the Description View the linked PRs

Description of problem:

CNCC failed to assign egressIP to NIC for Azure Workload Identity Cluster

Refer to https://issues.redhat.com/browse/CCO-294

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Created a Azure Workload Identity Cluster by "workflow-launch cucushift-installer-rehearse-azure-ipi-cco-manual-workload-identity-tp 4.14" from cluster-bot
2. Configure egressIP
3.

Actual results:

 % oc get egressip
NAME         EGRESSIPS      ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-3   10.0.128.100     

% oc get cloudprivateipconfig -o yaml
apiVersion: v1
items:
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-3
    creationTimestamp: "2023-08-14T04:41:05Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.128.100
    resourceVersion: "65159"
    uid: 2b7b1137-0e2e-46e8-9bca-1176330322a9
  spec:
    node: ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2
  status:
    conditions:
    - lastTransitionTime: "2023-08-14T04:41:17Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="LinkedAuthorizationFailed"
        Message="The client ''d367c1b8-9f5d-4257-b5c8-363f61af32c2'' with object id
        ''d367c1b8-9f5d-4257-b5c8-363f61af32c2'' has permission to perform action
        ''Microsoft.Network/networkInterfaces/write'' on scope ''/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-b4tlp9t-1d09d/providers/Microsoft.Network/networkInterfaces/ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2-nic'';
        however, it does not have permission to perform action ''Microsoft.Network/virtualNetworks/subnets/join/action''
        on the linked scope(s) ''/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-ln-b4tlp9t-1d09d/providers/Microsoft.Network/virtualNetworks/ci-ln-b4tlp9t-1d09d-2chnb-vnet/subnets/ci-ln-b4tlp9t-1d09d-2chnb-worker-subnet''
        or the linked scope(s) are invalid."'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: ci-ln-b4tlp9t-1d09d-2chnb-worker-centralus1-jgqp2
kind: List
metadata:
  resourceVersion: ""

Expected results:

EgressIP can be assigned to egress node

Additional info:

Bug OCPBUGS-13094: [4.14] bootkube.service failed in disconnected network install

View the Description View the linked PRs

Description of problem:

When installing OCP in a disconnected network which doesn’t have access to the public registry, bootkube.service failed

Version-Release number of selected component (if applicable):

from 4.14.0-0.nightly-2023-04-29-153308

How reproducible:

Always

Steps to Reproduce:

1.Prepare a VPC that doesn’t have the access to the Internet, setup a mirror registry inside the VPC and set related ImageContentSource in the install-config
2.Start the installation
3.

Actual results:

Failed when provisioning masters as it couldn’t get master ignition from bootstrap

May 04 07:31:56 maxu-az-dis-6d74v-bootstrap bootkube.sh[246724]: error: unable to read image registry.ci.openshift.org/ocp/release@sha256:227a73d8ff198a55ca0d3314d8fa94835d90769981d1c951ac741b82285f99fc: Get "https://registry.ci.openshift.org/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
May 04 07:31:56 maxu-az-dis-6d74v-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILUREMay 04 07:31:56 maxu-az-dis-6d74v-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.

Expected results:

Installation succeeded.

Additional info:

In disconnected install, we’re using the ICSP to pull image from the mirror registry, but bootkube.service was still trying to access the public registry. Checked the change log of bootkube.sh.template, it seems to be a regression issue of https://github.com/openshift/installer/pull/6990, it’s using “oc adm release info -o 'jsonpath={.metadata.version}' "${RELEASE_IMAGE_DIGEST}"” to get current OCP version in this scenario.

https://github.com/openshift/installer/pull/7178

Bug OCPBUGS-19338: Hide DeploymentConfig option from forms when its not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19313~~. The following is the description of the original issue:
—

Description

As a user, I dont want to see the option of "DeploymentConfigs" in any form I am filling, when I have not installed the same in the cluster.

Acceptance Criteria

Remove the DC option under the Resource Type dropdown in following forms:
- Import from Git
- Container Image
- Import JAR
- Builder Images (Developer Catalog)

Additional Details:

https://github.com/openshift/console/pull/13161

Story HOSTEDCP-1079: Add Dockerfile that references public, productized base images for HyperShift

View the Description View the linked PRs

The Dockerfile should not reference any CI images.

https://github.com/openshift/hypershift/pull/2857

Bug OCPBUGS-11046: TuningCNI cnf-test failure: sysctl allowlist update

View the Description View the linked PRs

Description of problem:

The following test is permafeailing in Prow CI:
[tuningcni] sysctl allowlist update [It] should start a pod with custom sysctl only after adding sysctl to allowlist

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-kni-cnf-features-deploy-master-e2e-gcp-ovn-periodic/1640987392103944192


[tuningcni]
9915/go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:26
9916  sysctl allowlist update
9917  /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:141
9918    should start a pod with custom sysctl only after adding sysctl to allowlist
9919    /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:156
9920  > Enter [BeforeEach] [tuningcni] - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/pkg/execute/ginkgo.go:9 @ 03/29/23 10:08:49.855
9921  < Exit [BeforeEach] [tuningcni] - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/pkg/execute/ginkgo.go:9 @ 03/29/23 10:08:49.855 (0s)
9922  > Enter [BeforeEach] sysctl allowlist update - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:144 @ 03/29/23 10:08:49.855
9923  < Exit [BeforeEach] sysctl allowlist update - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:144 @ 03/29/23 10:08:49.896 (41ms)
9924  > Enter [It] should start a pod with custom sysctl only after adding sysctl to allowlist - /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:156 @ 03/29/23 10:08:49.896
9925  [FAILED] Unexpected error:
9926      <*errors.errorString | 0xc00044eec0>: {
9927          s: "timed out waiting for the condition",
9928      }
9929      timed out waiting for the condition
9930  occurred9931  In [It] at: /go/src/github.com/openshift-kni/cnf-features-deploy/cnf-tests/testsuites/e2esuite/security/tuning.go:186 @ 03/29/23 10:09:53.377

Version-Release number of selected component (if applicable):

master (4.14)

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Test fails

Expected results:

Test passes

Additional info:

PR https://github.com/openshift-kni/cnf-features-deploy/pull/1445 adds some useful information to the reported archive.

Bug OCPBUGS-13735: Cluster-api SA can't create events

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13034~~. The following is the description of the original issue:
—
Description of problem:

Cluster-api pod can't create events due to RBAC. we may miss some useful event due to this.

E0503 07:20:44.925786       1 event.go:267] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"ad1-workers-f5f568855-vnzmn.175b911e43aa3f41", GenerateName:"", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Machine", Namespace:"ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1", Name:"ad1-workers-f5f568855-vnzmn", UID:"2b40a694-d36d-4b13-9afc-0b5daeecc509", APIVersion:"cluster.x-k8s.io/v1beta1", ResourceVersion:"144260357", FieldPath:""}, Reason:"DetectedUnhealthy", Message:"Machine ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1/ad1-workers/ad1-workers-f5f568855-vnzmn/ has unhealthy node ", Source:v1.EventSource{Component:"machinehealthcheck-controller", Host:""}, FirstTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), LastTimestamp:time.Date(2023, time.May, 3, 7, 20, 44, 923289409, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events is forbidden: User "system:serviceaccount:ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1:cluster-api" cannot create resource "events" in API group "" in the namespace "ocm-integration-23frm3gtnh3cf212daoe1a13su7buqk4-ad1"' (will not retry!)

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create an hosted cluster
2. Check cluster-api pod for some kind of error (e.g. slow node startup)
3.

Actual results:

Error

Expected results:

Event generated

Additional info:
ClusterRole hypershift-cluster-api is created here https://github.com/openshift/hypershift/blob/e7eb32f259b2a01e5bbdddf2fe963b82b331180f/hypershift-operator/controllers/hostedcluster/hostedcluster_controller.go#L2720

We should add create/patch/update for events there

https://github.com/openshift/hypershift/pull/2586

Bug OCPBUGS-19702: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7528

Bug OCPBUGS-7353: CheckNodePerf firing on infra nodes.

View the Description View the linked PRs

Description of problem:

CheckNodePerf is running on non master nodes, when the worker role label is not present.

Version-Release number of selected component (if applicable):

How reproducible:

in a Vmware cluster create a infra MCP, and label a node as role:infra

vsphere-problem-detector-operator will produce CheckNodePerf alerts and logs like

CheckNodePerf: xxxxxx failed: master node has disk latency of greater than 100ms

https://docs.openshift.com/container-platform/4.10/machine_management/creating-infrastructure-machinesets.html#creating-infra-machines_creating-infrastructure-machinesets

Steps to Reproduce:

1.
2.
3.

Actual results:

CheckNodePerf: xxxxx failed: master node has disk latency of greater than 100ms

Expected results:

no log entry, and no alert

Additional info:

The code only considers worker and master labels, also very complex nesting of conditions.

https://github.com/openshift/vsphere-problem-detector/blob/ca408db88a70cfa5aefa3128dff971a555994c29/pkg/check/node_perf.go#L133-L143

https://github.com/openshift/vsphere-problem-detector/pull/106

Bug OCPBUGS-12293: Update 4.14 prom-label-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prom-label-proxy/pull/355

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prom-label-proxy/pull/355

Bug OCPBUGS-13257: Labels added in the Git import flow are not propagated to the pipeline resources

View the Description View the linked PRs

Description of problem:

Labels added in the Git import flow are not propagated to the pipeline resources when a pipeline is added

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Goto Git Import Form
2. Add Pipeline
3. Add labels
4. Submit the form

Actual results:

The added labels are not propagated to the pipeline resources

Expected results:

The added labels should be added to the pipeline resources

Additional info:

https://github.com/openshift/console/pull/12808

Bug OCPBUGS-19923: Build timing tests failing due to faster run times

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19909~~. The following is the description of the original issue:
—
Description of problem:

Build timing test is failing due to faster run times on Bare Metal

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. run [sig-builds][Feature:Builds][timing] capture build stages and durations should record build stages and durations for docker 2.
3.

Actual results:

{  fail [github.com/openshift/origin/test/extended/builds/build_timing.go:101]: Stage PushImage ran for 95, expected greater than 100ms
Expected
    <bool>: true
to be false
Ginkgo exit error 1: exit with code 1}

Expected results:

Test should pass

Additional info:

https://github.com/openshift/origin/pull/28291

Bug OCPBUGS-13833: Installing cert-manager operator in console, its version flips constantly between v1.10.2 and v1.11.1

View the Description View the linked PRs

Description of problem:

Install cert-manager operator of version cert-manager-operator-bundle:v1.11.1-6 from console, the UI shown version slips between from v1.11.1 and v1.10.2 and v1.11.1 again and v1.10.2 again ... constantly.

Version-Release number of selected component (if applicable):

cert-manager-operator-bundle:v1.11.1-6, 4.13.0-0.nightly-2023-05-18-195839

How reproducible:

Always. I tried a few times in different envs, double confirmed.

Steps to Reproduce:

1. Install cert-manager operator of version cert-manager-operator-bundle:v1.11.1-6 from console
2. Watch console
3.

Actual results:

The UI shown version slips between from v1.11.1 and v1.10.2 and v1.11.1 again and v1.10.2 again ... constantly.
See attached video https://drive.google.com/drive/folders/1AFWquCK-pDCoQFMEOONQwGByBUg6tKR9?usp=sharing .

Expected results:

Should always show v1.11.1

Additional info:

No matter using index image v4.13 brew.registry.redhat.io/rh-osbs/iib:500235 (gotten from email "[CVP] (SUCCESS) (cvp-redhatopenshiftcfe: cert-manager-operator-bundle-container-v1.11.1-6)") or brew.registry.redhat.io/rh-osbs/iib-pub-pending:v4.13, both reproduced it.

https://github.com/openshift/console/pull/12743

Bug OCPBUGS-16571: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/71

Bug OCPBUGS-5360: Re-enable operator-install-single-namespace.spec.ts test

View the Description View the linked PRs

Description of problem:

https://issues.redhat.com//browse/OCPBUGS-5287 disabled the test due to https://issues.redhat.com/browse/THREESCALE-9015.  Once https://issues.redhat.com/browse/THREESCALE-9015 is resolved, need to re-enable the test.

https://github.com/openshift/console/pull/12424

Bug OCPBUGS-8232: `oc patch project` not working with OCP 4.12

View the Description View the linked PRs

Description of problem:

oc patch project command is failing to annotate the project

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Run the below patch command to update the annotation on existing project
~~~
oc patch project <PROJECT_NAME> --type merge --patch '{"metadata":{"annotations":{"openshift.io/display-name": "null","openshift.io/description": "This is a new project"}}}'
~~~

Actual results:

It produces the error output below:
~~~
The Project "<PROJECT_NAME>" is invalid: * metadata.namespace: Invalid value: "<PROJECT_NAME>": field is immutable * metadata.namespace: Forbidden: not allowed on this type 
~~~

Expected results:

The `oc patch project` command should patch the project with specified annotation.

Additional info:

Tried to patch the project with OCP 4.11.26 version, and it worked as expected.
~~~
oc patch project <PROJECT_NAME> --type merge --patch '{"metadata":{"annotations":{"openshift.io/display-name": "null","openshift.io/description": "New project"}}}'

project.project.openshift.io/<PROJECT_NAME> patched
~~~

The issue is with OCP 4.12, where it is not working.

https://github.com/openshift/openshift-apiserver/pull/356

Bug OCPBUGS-8310: Bump openshift/origin to kube 1.26.2

View the Description View the linked PRs

Bump to pick up fixes.

https://github.com/openshift/origin/pull/27764

Bug OCPBUGS-13366: DNS operator prone to spamming TopologyAwareHintsDisable events on GCP/Azure since May 5

View the Description View the linked PRs

The dns operator appears to have begun frequently spamming kube Events in some serial jobs across multiple clouds. (especially gcp and azure, aws is less common but there are some failures with the same signature)

The pathological events test and here it appears this started on May 5th. See the Pass Rate By NURP+ Combination panel for where this is most common.

As of the date of filing, pass rates are:
56% - gcp, amd64, sdn, ha, serial, techpreview
57% - gcp, amd64, sdn, ha, serial
60% - azure, amd64, ovn, ha, serial
60% - azure, amd64, ovn, ha, serial, techpreview

The events seem to consistently appear as follows on all clouds:

ns/openshift-dns service/dns-default hmsg/ade328ddf3 - pathological/true reason/TopologyAwareHintsDisabled Unable to allocate minimum required endpoints to each zone without exceeding overload threshold (5 endpoints, 3 zones), addressType: IPv4 From: 08:58:41Z To: 08:58:42Z

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-azure-sdn-techpreview-serial/1656207924667617280 (intervals)

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-sdn-techpreview-serial/1656207916375478272

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-sdn-serial/1655277608981499904

The Intervals item under "Debug Tools" is a great way to see these charted in time, see the "interesting events" section.

test=[sig-arch] events should not repeat pathologically for namespace openshift-dns

https://github.com/openshift/origin/pull/27916

Bug OCPBUGS-11889: Cross Origin Resource Sharing protection for the OpenShift Web Console

View the Description View the linked PRs

On https://issues.redhat.com/browse/RFE-2273 the customer analyzed quite correctly:

I have re-reviewed all of the provided data from the attached cases (DHL and ANZ) and have documented my findings below:
1) It looks like the request mentioned by the customer is sent to the Console API. Specifically `api/prometheus-tenancy/api/v1/*`
2) This is then forwarded to Cluster Monitoring (Thanos Querier) [0]
3) Thanos is configured to set the CORS headers to `*` due to the absence of the `--web.disable-cors` argument.[1]
4) The Thanos deployment is managed by the Cluster Monitoring Operator directly [2]
5) When using Postman, we can see the endpoint respond with a `access-control-allow-origin: *` [see image 1]
6) Manually setting the `--web.disable-cors` argument inside the Thanos Querier deployment, the `access-control-allow-origin: *` is removed.
7) Changing the Cluster Monitoring Operator deployment template[4] to include the flag and push the custom image into an OCP 4.10.31 cluster [3]
8) Seems like everything is working and the endpoint is not longer returning the CORS header. [see image 2]

We should set {}web.disable-cors{-} for our thanos deployment. We don't load any cross-origin resources through the console>thanos querier path, so this should just work.

https://github.com/openshift/cluster-monitoring-operator/pull/1950

Bug OCPBUGS-16809: IgnitionServer proxy configuration is not ready for IPv6

View the Description View the linked PRs

Description of problem:

When you have a HCP running and it's creating the HostedCluster pods it renders this IgnitionProxy config:

defaults
  mode http
  timeout connect 5s
  timeout client 30s
  timeout server 30s

frontend ignition-server
  bind *:8443 ssl crt /tmp/tls.pem
  default_backend ignition_servers

backend ignition_servers
  server ignition-server ignition-server:443 check ssl ca-file /etc/ssl/root-ca/ca.crt

This Configuration is not supported on Ipv6 causing the worker nodes to fail downloading the Ignition Payload

Version-Release number of selected component (if applicable):

MCE 2.4
OCP 4.14

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with the networking parameters set to IPv6 networks.
2. Check the IgnitionProxy config using: 

oc rsh <pod>
cat /tmp/haproxy.conf

Actual results:

Agent pod in the destination workers fails with:

Jul 26 10:23:44 localhost.localdomain next_step_runne[4242]: time="26-07-2023 10:23:44" level=error msg="ignition file download failed: request failed: Get \"https://ignition-server-clusters-hosted.apps.ocp-edge-cluster-0.qe.lab.redhat.com/ignition\": EOF" file="apivip_check.go:160"

Expected results:

The worker should download the ignition payload properly

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2850

Bug OCPBUGS-20308: Backport PR 28295 and 28238

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20307~~. The following is the description of the original issue:
—
Description of problem:

For Backport PR 28295 and 28238 to merge in 4.14

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/28304

Bug OCPBUGS-6586: Timeout is too short for 'oc idle' tests

View the Description View the linked PRs

Description of problem:

Running the following tests using Openshift on Openstack with Kuryr
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by all [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by checking previous scale [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by label [Suite:openshift/conformance/parallel]"
"[sig-cli] oc idle [apigroup:apps.openshift.io][apigroup:route.openshift.io][apigroup:project.openshift.io][apigroup:image.openshift.io] by name [Suite:openshift/conformance/parallel]"

Fails waiting for endpoints
STEP: wait until endpoint addresses are scaled to 2 01/21/23 01:16:42.024
Jan 21 01:16:42.025: INFO: Running 'oc --namespace=e2e-test-oc-idle-h2mvt --kubeconfig=/tmp/configfile3007731725 get endpoints idling-echo --template={{ len (index .subsets 0).addresses }} --output=go-template'
Jan 21 01:16:42.158: INFO: Error running /usr/local/bin/oc --namespace=e2e-test-oc-idle-h2mvt --kubeconfig=/tmp/configfile3007731725 get endpoints idling-echo --template={{ len (index .subsets 0).addresses }} --output=go-template:
StdOut>
Error executing template: template: output:1:8: executing "output" at <index .subsets 0>: error calling index: index of untyped nil. Printing more information for debugging the template:
    template was:
        {{ len (index .subsets 0).addresses }}
    raw data was:
        {"apiVersion":"v1","kind":"Endpoints","metadata":{"annotations":{"endpoints.kubernetes.io/last-change-trigger-time":"2023-01-21T01:16:40Z"},"creationTimestamp":"2023-01-21T01:16:40Z","labels":{"app":"idling-echo"},"managedFields":[{"apiVersion":"v1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:endpoints.kubernetes.io/last-change-trigger-time":{}},"f:labels":{".":{},"f:app":{}}}},"manager":"kube-controller-manager","operation":"Update","time":"2023-01-21T01:16:40Z"}],"name":"idling-echo","namespace":"e2e-test-oc-idle-h2mvt","resourceVersion":"409973","uid":"91cd122e-b418-4e29-98c6-2ff757c74a15"}}
    object given to template engine was:
        map[apiVersion:v1 kind:Endpoints metadata:map[annotations:map[endpoints.kubernetes.io/last-change-trigger-time:2023-01-21T01:16:40Z] creationTimestamp:2023-01-21T01:16:40Z labels:map[app:idling-echo] managedFields:[map[apiVersion:v1 fieldsType:FieldsV1 fieldsV1:map[f:metadata:map[f:annotations:map[.:map[] f:endpoints.kubernetes.io/last-change-trigger-time:map[]] f:labels:map[.:map[] f:app:map[]]]] manager:kube-controller-manager operation:Update time:2023-01-21T01:16:40Z]] name:idling-echo namespace:e2e-test-oc-idle-h2mvt resourceVersion:409973 uid:91cd122e-b418-4e29-98c6-2ff757c74a15]]

When using 60 seconds in PollImmediate instead of 30 the tests pass.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-19-110743

How reproducible:

Consistently

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27913

Bug HELM-484: Basic authentication documentation update

View the Description View the linked PRs

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Follow https://github.com/openshift/console/blob/master/docs/helm/configure-namespaced-helm-repos.md for adding project helm chart repositories supporting basic auth.
If we create a repository and provide basicAuthConfig as shown in current documentation we will get an error.
The documentation here needs an update as the basicAuthConfig secret name should be specified with a `name` field

Actual results:

* spec.connectionConfig.basicAuthConfig: Invalid value: "string": spec.connectionConfig.basicAuthConfig in body must be of type object: "string"
Expected results:

We should be able to add the repository supporting basic auth

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Additional info:

Documentation Requirement: Yes/No (needs-docs|upstream-docs / no-doc)

Upstream: <Inputs/Requirement details>/ Not Applicable

Downstream: <Type: Doc defect/More inputs to doc>/ Not Applicable

Provide link to the relevant section
Provide doc inputs and details required

Release Notes Type: <New Feature/Enhancement/Known Issue/Bug
fix/Breaking change/Deprecated Functionality/Technology Preview>

https://github.com/openshift/console/pull/12768

Task MGMT-14042: Enable overriding ENABLE_DATA_COLLECTION

View the linked PRs

https://github.com/openshift/assisted-service/pull/5056

Bug OCPBUGS-12996: Prometheus UI reports Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

View the Description View the linked PRs

Description of problem:
Same for OCP 4.14.

In OCP 4.13 when trying to reach prometheus UI  via port-forward, e.g. `oc port-forward prometheus-k8s-0` the UI url($HOST:9090/graph) is returning `Error opening React index.html: open web/ui/static/react/index.html: no such file or directory`

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-24-061922

How reproducible:

100%

Steps to Reproduce:

1.  oc -n openshift-monitoring port-forward prometheus-k8s-0 9090:9090 --address='0.0.0.0' 

2. curl http://localhost:9090/graph

Actual results:

Error opening React index.html: open web/ui/static/react/index.html: no such file or directory

Expected results:

Prometheus UI is loaded

Additional info:

 The UI loads fine when following the same steps in 4.12.

https://github.com/openshift/prometheus/pull/162

Bug OCPBUGS-16009: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1499

Bug OCPBUGS-19623: Limit multus pod watch to pods on the local node

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19550~~. The following is the description of the original issue:
—
Multus doesn't need to watch pods on other nodes. To save memory and CPU set MULTUS_NODE_NAME to filter pods that multus watches.

https://github.com/openshift/cluster-network-operator/pull/2022

Bug OCPBUGS-5356: [OCP 4.10] machine-config-daemon: invalid memory address or nil pointer dereference

View the Description View the linked PRs

Description of problem:

This issue is triggered by the lack of the file "/etc/kubernetes/kubeconfig" in the node, but what i found interesting is the aesthetic error that follows:

2023-01-04T10:56:50.807982171Z I0104 10:56:50.807918   18013 start.go:112] Version: v4.11.0-202212070335.p0.g60746a8.assembly.stream-dirty (60746a843e7ef8855ae00f2ffcb655c53e0e8296)
2023-01-04T10:56:50.810326376Z I0104 10:56:50.810190   18013 start.go:125] Calling chroot("/rootfs")
2023-01-04T10:56:50.810326376Z I0104 10:56:50.810274   18013 update.go:1972] Running: systemctl start rpm-ostreed
2023-01-04T10:56:50.855151883Z I0104 10:56:50.854666   18013 rpm-ostree.go:353] Running captured: rpm-ostree status --json
2023-01-04T10:56:50.899635929Z I0104 10:56:50.899574   18013 rpm-ostree.go:353] Running captured: rpm-ostree status --json
2023-01-04T10:56:50.941236704Z I0104 10:56:50.941179   18013 daemon.go:236] Booted osImageURL: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:318187717bd19ef265000570d5580ea680dfbe99c3bece6dd180537a6f268f
e1 (410.84.202210061459-0)
2023-01-04T10:56:50.973206073Z I0104 10:56:50.973131   18013 start.go:101] Copied self to /run/bin/machine-config-daemon on host
2023-01-04T10:56:50.973259966Z E0104 10:56:50.973196   18013 start.go:177] failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory
2023-01-04T10:56:50.975399571Z panic: runtime error: invalid memory address or nil pointer dereference
2023-01-04T10:56:50.975399571Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x173d84f]
2023-01-04T10:56:50.975399571Z
2023-01-04T10:56:50.975399571Z goroutine 1 [running]:
2023-01-04T10:56:50.975399571Z main.runStartCmd(2023-01-04T10:56:50.975436752Z 0x2c3da80?, {0x1bc0b3b?, 0x0?, 0x0?})
2023-01-04T10:56:50.975436752Z  /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/start.go:179 +0x70f
2023-01-04T10:56:50.975436752Z github.com/spf13/cobra.(*Command).execute(0x2c3da80, {0x2c89310, 0x0, 0x0})
2023-01-04T10:56:50.975436752Z  /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:860 +0x663
2023-01-04T10:56:50.975448580Z github.com/spf13/cobra.(*Command).ExecuteC(0x2c3d580)
2023-01-04T10:56:50.975448580Z  /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:974 +0x3b4
2023-01-04T10:56:50.975456464Z github.com/spf13/cobra.(*Command).Execute(...)
2023-01-04T10:56:50.975456464Z  2023-01-04T10:56:50.975464649Z /go/src/github.com/openshift/machine-config-operator/vendor/github.com/spf13/cobra/command.go:902
2023-01-04T10:56:50.975464649Z k8s.io/component-base/cli.Run(2023-01-04T10:56:50.975472575Z 0x2c3d580)
2023-01-04T10:56:50.975472575Z  /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/component-base/cli/run.go:105 +0x385
2023-01-04T10:56:50.975485076Z main.main()
2023-01-04T10:56:50.975485076Z  /go/src/github.com/openshift/machine-config-operator/cmd/machine-config-daemon/main.go:28 +0x25

Version-Release number of selected component (if applicable):

4.11.20

How reproducible:

Always

Steps to Reproduce:

1. Remove / change the name of the file "/etc/kubernetes/kubeconfig"
2. Delete machine-config-daemon pod
3.

Actual results:

2023-01-04T10:56:50.973259966Z E0104 10:56:50.973196   18013 start.go:177] failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory
2023-01-04T10:56:50.975399571Z panic: runtime error: invalid memory address or nil pointer dereference

Expected results:

Fatal error
 
 failed to load kubelet kubeconfig: open /etc/kubernetes/kubeconfig: no such file or directory

but no runtime error

Additional info:

https://github.com/openshift/machine-config-operator/blob/92012a837e2ed0ed3c9e61c715579ac82ad0a464/cmd/machine-config-daemon/start.go#L179

https://github.com/openshift/machine-config-operator/pull/3651

Bug OCPBUGS-10293: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/588

Bug OCPBUGS-13006: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1946

Bug OCPBUGS-14874: Helm Chart installation form hangs on create if JSON-schema is using 2019-09 or 2020-20 standard revisions

View the Description View the linked PRs

Description of problem:

Deploying a helm chart that features a values.schema.json using either 2019-09 or 2020-20 (latest) revisions of the JSON-Schema results in the UI hanging on create with three dots loading... This is not the case if YAML view is used, since I suppose this view is not trying to be clever and let Helm validate the chart values against the schema itself.

Version-Release number of selected component (if applicable):

Reproduced in 4.13, probably affects other versions as well.

How reproducible:

100%

Steps to Reproduce:

1. Go to Helm tab.
2. Click create in top right and select Repository
3. Paste following into YAML view and click Create:

apiVersion: helm.openshift.io/v1beta1
kind: ProjectHelmChartRepository
metadata:
  name: reproducer
spec:
  connectionConfig:
    url: 'https://raw.githubusercontent.com/tumido/helm-backstage/blog2'

4. Go to the Helm tab again (if redirected elsewhere)
5. Click create in top right and select Helm Release
6. In catalog filter select Chart repositories: Reproducer
7. Click on the single tile available (Backstage) and click Create
8. Switch to Form view
9. Leave default values and click Create
10. Stare at the always loading screen that never proceeds further.

Actual results:

Expected results:

It installs and deploys the chart

Additional info:

This is caused by a JSON Schema containing a $schema key pointing which revision of the JSON Schema standard should be used:

{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
}

I've managed to trace this back to this react-jsonschema-form issue:

https://github.com/rjsf-team/react-jsonschema-form/issues/2241

It seems the library used here for validation doesn't support 2019-09 draft and the most current revision 2020-20 revision.

It happens only if the chart follows the JSON Schema standard and declares the revision properly.

Workarounds:

force charts to follow an outdated revision
force charts to declare "use the latest revision available". This option was explicitly removed from the standard and users are advised NOT to use this.

IMO best solution:
Helm form renderer should NOT do any validation, since it can't handle the schema properly. Instead, it should leave this job to the Helm backend. Helm validates the values against the schema when installing the chart anyways. The YAML view also does no validation. That one seems to do the job properly.

Currently, there is no formal requirement for charts admitted to the helm curated catalog saying that the most recent JSON Schema revision is 4 years old and later 2 revisions are not supported.

Also, the Form UI should not just hang on submit. Instead, it should at least fail gracefully.

https://github.com/openshift/console/pull/12929

Bug OCPBUGS-14965: Static DHCP scripts SELinux problem setting hostname

View the Description View the linked PRs

Description of problem:

As discovered in https://bugzilla.redhat.com/show_bug.cgi?id=2111632 the dispatcher scripts don't have permission to set the hostname directly. We need to use systemd-run to get them into an appropriate SELinux context.

I doubt the static DHCP scripts are still being used intentionally since we have proper static IP support now, but since the fix is pretty trivial we should go ahead and do it since technically the feature is still supported.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3746

Bug OCPBUGS-20775: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-provisioner/pull/72

Bug OCPBUGS-20887: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-baremetal-operator/pull/369

Bug OCPBUGS-21450: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/173

Bug OCPBUGS-21393: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oauth-server/pull/138

Bug OCPBUGS-14323: Change static manifest pod files permissions to 0600 to conform with CIS benchmarks

View the Description View the linked PRs

Refer to the CIS RedHat OpenShift Container Platform Benchmark PDF: https://drive.google.com/file/d/12o6O-M2lqz__BgmtBrfeJu1GA2SJ352c/view
1.1.7 Ensure that the etcd pod specification file permissions are set to 600 or more restrictive (Manual)
======================================================================================================
As per CIS v1.3 PDF permissions should be 600 with the following statement:
"The pod specification file is created on control plane nodes at /etc/kubernetes/manifests/etcd-member.yaml with permissions 644. Verify that the permissions are 600 or more restrictive."
But when I ran the following command it was showing 644 permissions

for i in $(oc get pods -n openshift-etcd -l app=etcd -o name | grep etcd )
do
echo "check pod $i"
oc rsh -n openshift-etcd $i \
stat -c %a /etc/kubernetes/manifests/etcd-pod.yaml
done

Bug OCPBUGS-16077: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5371

Bug OCPBUGS-17487: wrong annotation for ThanosRulerConfig.Resources in code

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/cluster-monitoring-operator/blob/release-4.14/pkg/manifests/types.go#L648-L649

// Defines resource requests and limits for the Alertmanager container.

should be

// Defines resource requests and limits for the Thanos Ruler container.

https://github.com/openshift/cluster-monitoring-operator/pull/2070

Bug OCPBUGS-17691: prometheus-adapter image ref in CMO manifests is outdated

View the Description View the linked PRs

We set image links on CMO's jsonnet code, as these can sometimes be used to populate labels and is generally considered good documentation pratice.

In a cluster these links are replaced by CVO.

prometheus-adapter is now a k8s project and has moved locations accordingly from directxman12/k8s-prometheus-adapter to kubernetes-sigs/prometheus-adapter. This should be reflected in our image links, set at https://github.com/openshift/cluster-monitoring-operator/blob/35a063722c7e3b68d57aed18dc81f0dbdfbfc004/jsonnet/main.jsonnet#L66.

https://github.com/openshift/cluster-monitoring-operator/pull/2074

Bug OCPBUGS-21841: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3984

Bug OCPBUGS-22363: lack of hypershift labels for hcp components ovn,cloud-network-config,multus-admission controllers

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19370~~. The following is the description of the original issue:
—
Description of problem:

For hcp resources:
  "cloud-network-config-controller"
  "multus-admission-controller"
  "ovnkube-control-plane"

no `hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}` found in the above hcp resources

Version-Release number of selected component (if applicable):

4.14

How reproducible:

100%

Steps to Reproduce:

1. create a hosted cluster 
2. check the labels of those resources
e.g. `$ oc get pod multus-admission-controller-7c677c745c-l4dbc  -oyaml` to check the labels of it.

Or refer testcase: ocp-44988

Actual results:

no expected label found

Expected results:

the pods have the label:
`hypershift.openshift.io/hosted-control-plane:{hostedcluster resource namespace}-{cluster-name}`

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2081

Bug OCPBUGS-12901: oc does not preserve a speficic release image provided with --to-image=''

View the Description View the linked PRs

Description of problem:

When a (recommended/conditional) release image is provided with --to-image='', the specified image name is not preserved in the ClusterVersion object.

Version-Release number of selected component (if applicable):

How reproducible:

100% with oc >4.9

Steps to Reproduce:

$ oc version
Client Version: 4.12.2
Kustomize Version: v4.5.7
Server Version: 4.12.2
Kubernetes Version: v1.25.4+a34b9e9

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
  "url": "https://access.redhat.com/errata/RHSA-2023:0569",
  "version": "4.12.2"
}
$ oc adm release info 4.12.3 -o jsonpath='{.image}'
quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36
$ skopeo copy docker://quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 docker://quay.example.com/playground/release-images
Getting image source signatures
Copying blob 64096b96a7b0 done  
Copying blob 0e0550faf8e0 done  
Copying blob 97da74cc6d8f skipped: already exists  
Copying blob d8190195889e skipped: already exists  
Copying blob 17997438bedb done  
Copying blob fdbb043b48dc done  
Copying config b49bc8b603 done  
Writing manifest to image destination
Storing signatures
$ skopeo inspect docker://quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36|jq '.Name,.Digest'
"quay.example.com/playground/release-images"
"sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36"
$ oc adm upgrade --to-image=quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 Requesting update to 4.12.3

Actual results:

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36",    <--- not quay.example.com
  "url": "https://access.redhat.com/errata/RHSA-2023:0728",
  "version": "4.12.3"
}

$ oc get clusterversion/version -o jsonpath='{.status.history}'|jq
[
  {
    "completionTime": null,
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36",         <--- not quay.example.com
    "startedTime": "2023-04-28T07:39:11Z",
    "state": "Partial",
    "verified": true,
    "version": "4.12.3"
  },
  {
    "completionTime": "2023-04-27T14:48:06Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
    "startedTime": "2023-04-27T14:24:29Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.2"
  }
]

Expected results:

$ oc get clusterversion/version -o jsonpath='{.status.desired}'|jq
{
  "channels": [
    "candidate-4.12",
    "candidate-4.13",
    "eus-4.12",
    "fast-4.12",
    "stable-4.12"
  ],
  "image": "quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 ",
  "url": "https://access.redhat.com/errata/RHSA-2023:0728",
  "version": "4.12.3"
}$ oc get clusterversion/version -o jsonpath='{.status.history}'|jq
[
  {
    "completionTime": null,
    "image": "quay.example.com/playground/release-images@sha256:382f271581b9b907484d552bd145e9a5678e9366330059d31b007f4445d99e36 ",
    "startedTime": "2023-04-28T07:39:11Z",
    "state": "Partial",
    "verified": true,
    "version": "4.12.3"
  },
  {
    "completionTime": "2023-04-27T14:48:06Z",
    "image": "quay.io/openshift-release-dev/ocp-release@sha256:31c7741fc7bb73ff752ba43f5acf014b8fadd69196fc522241302de918066cb1",
    "startedTime": "2023-04-27T14:24:29Z",
    "state": "Completed",
    "verified": false,
    "version": "4.12.2"
  }
]

Additional info:

While in earlier versions (<4.10) we used to preserve the specified image [1], we now (as of 4.10) store the public image as the desired version [2].
[1] https://github.com/openshift/oc/blob/88cfeb4aa2d74ee5f5598c571661622c0034081b/pkg/cli/admin/upgrade/upgrade.go#L278
[2] https://github.com/openshift/oc/blob/5711859fac135177edf07161615bdabe3527e659/pkg/cli/admin/upgrade/upgrade.go#L278

https://github.com/openshift/oc/pull/1416

Bug OCPBUGS-19361: Expose and propagate TopologySpreadConstraints for admission webhook

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19356~~. The following is the description of the original issue:
—
Backport facilitator for linked issue.

https://github.com/openshift/cluster-monitoring-operator/pull/2088

Bug OCPBUGS-10343: Ironic inspector service should be proxied

View the Description View the linked PRs

Description of problem:

When deploying hosts using ironic's agent both the ironic service address and inspector address are required.

The ironic service is proxied such that it can be accessed at a consistent endpoint regardless of where the pod is running. This is not the case for the inspection service.

This means that if the inspection service moves after we find the address, provisioning will fail.

In particular this non-matching behavior is frustrating when using the CBO [GetIronicIP function|https://github.com/openshift/cluster-baremetal-operator/blob/6f0a255fdcc7c0e5c04166cb9200be4cee44f4b7/provisioning/utils.go#L95-L127] as one return value is usable forever but the other needs to somehow be re-queried every time the pod moves.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Relatively

Steps to Reproduce:

1. Retrieve the inspector IP from GetIronicIP
2. Reschedule the inspector service pod
3. Provision a host

Actual results:

Ironic python agent raises an exception

Expected results:

Host provisions

Additional info:

This was found while deploying clusters using ZTP

In this scenario specifically an image containing the ironic inspector IP is valid for an extended period of time. The same image can be used for multiple hosts and possibly multiple different spoke clusters.

Our controller shouldn't be expected to watch the ironic pod to ensure we update the image whenever it moves. The best we can do is re-query the inspector IP whenever a user makes changes to the image, but that may still not be often enough.

Bug OCPBUGS-13080: Root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
This is fixed by the first commit in the upstream Metal³ PR https://github.com/metal3-io/baremetal-operator/pull/1264

https://github.com/openshift/baremetal-operator/pull/276

Bug OCPBUGS-12659: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/k8s-prometheus-adapter/pull/71

Bug OCPBUGS-15726: openapi/v3 discovery and kubectl explain not working

View the Description View the linked PRs

Description of problem:

oc explain tests have to be enabled to ensure openapi/v3 is working properly

The tests have been temporarily disabled in order to unblock the oc kube bump (https://github.com/openshift/oc/pull/1420). 

The following efforts need to be done/merged to make openapi/v3 work:

- [DONE] oauth-apiserver kube bump: https://github.com/openshift/oauth-apiserver/pull/89
- [DONE] merge kubectl fix backport https://github.com/kubernetes/kubernetes/pull/118930 and bump kube dependency in oc to include this fix (https://github.com/openshift/oc/pull/1515)
- [DONE] merge https://github.com/kubernetes/kubernetes/pull/118881 and carry this PR in our kube-apiserver to stop oc explain being flaky (https://github.com/openshift/kubernetes/pull/1629)
- [DONE] merge https://github.com/kubernetes/kubernetes/pull/118879 and carry this PR in our kube-apiserver to enable apiservices (https://github.com/openshift/kubernetes/pull/1630)
- [DONE] make openapi/v3 work for our special groups https://github.com/openshift/kubernetes/pull/1654 (https://github.com/openshift/kubernetes/pull/1617#issuecomment-1609864043, slack discussion: https://redhat-internal.slack.com/archives/CC3CZCQHM/p1687882255536949?thread_ts=1687822265.954799&cid=CC3CZCQHM)
- [DONE] enable back oc explain tests: https://github.com/openshift/origin/pull/28155 and bring in new tests: https://github.com/openshift/origin/pull/28129
- [OPTIONAL] bring in additional upstream kubectl/oc explain tests: https://github.com/kubernetes/kubernetes/pull/118885
- [OPTIONAL] backport https://github.com/kubernetes/kubernetes/pull/119839 and https://github.com/kubernetes/kubernetes/pull/119841 (backport of https://github.com/kubernetes/kubernetes/pull/118881 and https://github.com/kubernetes/kubernetes/pull/118879)

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-18828: tuned pod in the guest cluster uses control plane release image after controlplane release upgrade

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18754~~. The following is the description of the original issue:
—
Description of problem:

After control plane release upgrade, in the guest cluster pod 'tuned' uses control plane release image

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. create a cluster in 4.14.0-0.ci-2023-09-06-180503
2. control plane release upgrade to 4.14-2023-09-07-180503
3. in the guest cluster check container image in pod tuned

Actual results:

pod tuned uses control plane release image 4.14-2023-09-07-180503

Expected results:

pod tuned uses release image 4.14.0-0.ci-2023-09-06-180503

Additional info:

After controlplane release upgrade, in control plane namespace, cluster-node-tuning-operator uses control plane release image:

jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].name}{"\n"}'
cluster-node-tuning-operator
jiezhao-mac:hypershift jiezhao$ oc get pods cluster-node-tuning-operator-6dc549ffdf-jhj2k -n clusters-jie-test -ojsonpath='{.spec.containers[].image}{"\n"}'
registry.ci.openshift.org/ocp/4.14-2023-09-07-180503@sha256:60bd6e2e8db761fb4b3b9d68c1da16bf0371343e3df8e72e12a2502640173990

https://github.com/openshift/hypershift/pull/3005

Bug OCPBUGS-20380: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1758

Bug OCPBUGS-26460: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/156

Bug OCPBUGS-15008: Global configuration of 'KnativeServing' is missing

View the Description View the linked PRs

Description of problem:

Global configuration of 'KnativeServing' is missing after user installed the Operator of 'Serverless' successfully

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-13-223353

How reproducible:

Always

Steps to Reproduce:

1. Installed 'Serveless' Operator, make sure the operator has been installed successfully, and the Knative Serving instance is created without any error
2. Navigate to Administration -> Cluster Settings -> Global Configuration
3. Check if KnativeServing is listed in the Cluster Setting page

Actual results:

KnativeServing is missing

Expected results:

KnativeServing should list in the Global Configuration page

Additional info:

https://github.com/openshift/console/pull/13059

Bug OCPBUGS-10164: Update 4.14 ose-cluster-capi-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-capi-operator/pull/104

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-capi-operator/pull/104

Bug OCPBUGS-14756: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/679

Bug OCPBUGS-19926: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-libvirt/pull/269

Bug OCPBUGS-24640: [release-4.14] ODF Dynamic plugin should not expose Server header

View the Description View the linked PRs

This is a clone of issue OCPBUGS-24186. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13423

Bug OCPBUGS-22375: Delete results results.tekton.dev annotations on rerun of PipelineRuns

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20362~~. The following is the description of the original issue:
—
Description of problem:

Creating a pipelinerun with previous annotations leads to the result not being created. But records are updated with new taskruns.

https://github.com/tektoncd/results/issues/556

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Install TektonResults on the cluster
2. Create a Pipeline and start the Pipeline
3. Rerun the PipelineRun
3. Check the records endpoint. eg: https://tekton-results-api-service-openshift-pipelines.apps.viraj-11-10-2023.devcluster.openshift.com/apis/results.tekton.dev/v1alpha2/parents/viraj/results/-/records
the new PipelineRun is not get saved.

Actual results:

New PipelineRun get created after the rerun is not get saved in the records

Expected results:

All PipelineRun should be saved in the records

Additional info:

Document to install TektonResults on the cluster https://gist.github.com/vikram-raj/257d672a38eb2159b0368eaed8f8970a

https://github.com/openshift/console/pull/13278

Bug TRT-1202: Hypershift 4.15 ci jobs failing

View the Description View the linked PRs

Failing payload https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn/1693191532644929536

due to

kubeconfig didn't become available: timed out waiting for the condition

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn

Bug OCPBUGS-20640: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-livenessprobe/pull/50

Bug OCPBUGS-21372: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-admission-controller/pull/71

Bug OCPBUGS-23747: Unresponsive server API in ipv6 disconnected agent-based hosted cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20246~~. The following is the description of the original issue:
—
Description of problem:

Installing ipv6 agent-based hosted cluster in disconnected environment. The hosted control plane is available but when using its kubeconfig to run oc commands on the hosted cluster, I'm getting 

E1009 08:05:34.000946  115216 memcache.go:265] couldn't get current server API group list: Get "https://fd2e:6f44:5dd8::58:31765/api?timeout=32s": dial tcp [fd2e:6f44:5dd8::58]:31765: i/o timeout

Version-Release number of selected component (if applicable):

OCP 4.14.0-rc.4

How reproducible:

100%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

I can use oc commands against the hosted cluster

Additional info:

https://github.com/openshift/hypershift/pull/3228

Bug OCPBUGS-10655: Developer catalog shows ImageStreams as samples which has no sampleRepo

View the Description View the linked PRs

Description of problem:
The dev console shows a list of samples. The user can create a sample based on a git repository. But some of these samples doesn't include a git repository reference and could not be created.

Version-Release number of selected component (if applicable):
Tested different frontend versions against a 4.11 cluster and all (oldest tested frontend was 4.8) show the sample without git repository.

But the result also depends on the installed samples operator and installed ImageStreams.

How reproducible:
Always

Steps to Reproduce:

Switch to the Developer perspective
Navigate to Add > All Samples
Search for Jboss
Click on "JBoss EAP XP 4.0 with OpenJDK 11" (for example)

Actual results:
The git repository is not filled and the create button is disabled.

Expected results:
Samples without git repositories should not be displayed in the list.

Additional info:
The Git repository is saved as "sampleRepo" in the ImageStream tag section.

https://github.com/openshift/console/pull/12667

Bug OCPBUGS-11092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3654

Bug OCPBUGS-12240: CCPMSO keeps out-of-date openshift/api vendoring

View the Description View the linked PRs

Description of problem:

CCPMSO uses a copy of the manifests from openshift/api. However, these appear out-of-sync with respect to the vendored version of openshift/api

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/199

Bug OCPBUGS-13408: $ openshift-install agent create agent-config-template --dir=./foo gives nothing in the INFO log-level

View the Description View the linked PRs

Description of problem:

The agent-config-template creation command give no INFO log in the output, however, it generates the file.

Version-Release number of selected component (if applicable):

v4.13

How reproducible:

$ openshift-install agent create agent-config-template --dir=./foo

Steps to Reproduce:

1.
2.
3.

Actual results:

$ openshift-install agent create agent-config-template --dir=./foo
INFO

Expected results:

Additional info:

$ openshift-install agent create agent-config-template --dir=./foo
INFO Created Agent Config Template in . directory

https://github.com/openshift/installer/pull/7408

Bug OCPBUGS-14488: Bump Kubernetes to 0.27.1

View the Description View the linked PRs

Description of problem:

Bump Kubernetes to 0.27.1 and bump dependencies

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/csi-driver-shared-resource-operator/pull/79

Bug OCPBUGS-13921: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/operator-framework/operator-marketplace/pull/523

Bug OCPBUGS-14163: HostedCluster's ETCD pod cannot run on IPv6 as a primary network

View the Description View the linked PRs

Description of problem:

Trying to deploy a HostedCluster using an IPv6 network, the control plane fails to start. These are the networking parameters for the HostedCluster:

  networking:
    clusterNetwork:
    - cidr: fd01::/48
    networkType: OVNKubernetes
    serviceNetwork:
    - cidr: fd02::/112


When the control plane pods are created, the etcd pod will remain in crashloopbackoff. The error in the logs:

invalid value "https://fd01:0:0:3::4c:2380" for flag -listen-peer-urls: URL address does not have the form "host:port": https://fd01:0:0:3::4c:2380

Version-Release number of selected component (if applicable):

Any

How reproducible:

Always

Steps to Reproduce:

1. Create a HostedCluster with the networking parameters set to IPv6 networks.
2. The etcd pod will be created and will fail to start.

Actual results:

etcd crashses at start

Expected results:

etcd starts properly and the other control plane pods follow

Additional info:

N/A

https://github.com/openshift/hypershift/pull/2846

Bug OCPBUGS-13693: Bridge getConsolePlugins func throws exception if console plugin request fails

View the Description View the linked PRs

Description of problem:

We are not error checking the response when we request console plugins in getConsolePlugins. If this request fails, we still try to access the "Items" property of the response, which is nil, and causes an exception to be trhown. We need to make sure the request succeeded before referencing any properties of the response.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Run bridge locally without setting the requisite env vars

Actual results:

A runtime exception is thrown from the getConsolePlugins function and bridge terminates

Expected results:

An error should be logged and bridge should continue to run

Additional info:

https://github.com/openshift/console/pull/12817

Bug OCPBUGS-14193: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/631

Bug OCPBUGS-16654: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17253: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1529

Bug OCPBUGS-17595: Updating YAML from console shows error

View the Description View the linked PRs

Description of problem:

After a component is ready, if we edit the component YAML from the console, it shows a stream of error. The YAML does get updated but the error goes away only after multiple reload.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy a pod/deployment
2. After they are seen ready, update the YAML from console
3. Error is seen

Actual results:

Expected results:

No error

Additional info:

https://github.com/openshift/console/pull/13090

Bug OCPBUGS-24474: S2I Build Wizard should check for Containerfile in addition to Dockerfile

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-22976~~. The following is the description of the original issue:
—
Description of problem:

A Github project with a Containerfile instead of a Dockerfile is not seen as a Buildah target, and the wizard falls through to templating as a standard (language) project.

Version-Release number of selected component (if applicable):


Server Version: 4.13.18
Kubernetes Version: v1.26.9+c7606e7

How reproducible:

Always

Steps to Reproduce:

1. Create a git application with Containerfile, e.g. https://github.com/cwilkers/jumble-c
2. Use the Developer view to add the app as a git repo
3. Observe failure as project is not built properly due to ignoring Containerfile

Actual results:

Build failure

Expected results:

Buildah includes Containerfile which includes html and other resources required for app

Additional info:

https://github.com/cwilkers/jumble-c

https://github.com/openshift/console/pull/13415

Bug OCPBUGS-14959: Error for DuplicateClusterRoleBinding and Edit ClusterRoleBinding subject in RHOCP4 Web Console

View the Description View the linked PRs

Description of problem:

When users are trying to DuplicateClusterRoleBinding and Edit ClusterRoleBinding subject in RHOCP web console , getting below error :
" Error Loading : Name parameter invalid: "system%3Acontroller%3A<name-of-role-ref>": may not contain '%' "

Version-Release number of selected component (if applicable):

Tested in OCP 4.12.18

How reproducible:

Always

Steps to Reproduce:

1. Open OpenShift web console
2. Select project : Openshift
3. Under User management -> Click Rolebindings
4. Look for any RoleBinding having Role Ref with format `system:<name>` 
5. At the end of that line, click on 3 dots where below options will be available :
- Duplicate ClusterRoleBinding
- Edit ClusterroleBinding subject
6. Select/click on any of the option

Actual results:

After selecting Duplicate ClusterRoleBinding or Edit ClusterroleBinding subject, getting below error :
Error Loading : Name parameter invalid: "system%3AXXX": may not contain '%'

Expected results:

After selecting Duplicate ClusterRoleBinding or Edit ClusterroleBinding subject, the correct/expected web page must be open.

Additional info:

When Duplicate or Edit RoleBinding `registry-registry-role` with Role Ref `system:registry` , it is working as expected.
When Duplicate or Edit RoleBinding `system:sdn-readers` with Role Ref `system:sdn-reader` , getting below error :
Error Loading : Name parameter invalid: "system%3Asdn-readers": may not contain '%'

Duplicate ClusterRoleBinding  or Edit ClusterRoleBindingBut subject working for few RoleBindings only (having Role ref system:<name>).

Screenshots are attached here : https://drive.google.com/drive/folders/1QHpdensG2gKx0tSv1zkF7Qiyert6eaSg?usp=sharing

https://github.com/openshift/console/pull/12939

Bug OCPBUGS-19771: OCP upgrade 4.13 to 4.14 fails with: an unknown error has occurred: MultipleErrors

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19418~~. The following is the description of the original issue:
—
Description of problem:

OCP Upgrades fail with message "Upgrade error from 4.13.X: Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"

Version-Release number of selected component (if applicable):

Currently 4.14.0-rc.1, but we observed the same issue with previous 4.14 nightlies too: 
4.14.0-0.nightly-2023-09-12-195514
4.14.0-0.nightly-2023-09-02-132842
4.14.0-0.nightly-2023-08-28-154013

How reproducible:

1 out of 2 upgrades

Steps to Reproduce:

1. Deploy OCP 4.13 with latest GA on a baremetal cluster with IPI and OVN-K
2. Upgrade to latest 4.14 available
3. Check cluster version status during the upgrade, at some point upgrade stops with message: "Upgrade error from 4.13.X Unable to apply 4.14.0-X: an unknown error has occurred: MultipleErrors"
4. Check OVN pods "oc get pods -n openshift-ovn-kubernetes", there are pods running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.
5. Check cluster operators "oc get co" mainly dns, network, and machine-config remained in 4.13 and degraded.

Actual results:

Upgrade not completed, and OVN pods remain in a restarting loop with failures.

Expected results:

Upgrade should be completed without issues, and OVN pods should remain in a Running status without restarts.

Additional info:

We have tested this with latest GA versions of 4.13 (as today Sep 19: 4.13.13 to 4.14.0-rc1), but we have been observing this since 20 days ago, with previous versions of 4.13 and 4.14.
Our deployments have single stack IPv4 , one NIC for provisioning and one NIC for baremetal (machine network)

These are the results from our latest test from 4.13.13 to 4.14.0-rc1

$ oc get clusterversion
NAME     VERSION  AVAILABLE  PROGRESSING  SINCE  STATUS
version           True       True         2h8m   Unable to apply 4.14.0-rc.1: an unknown error has occurred: MultipleErrors

$ oc get mcp
NAME    CONFIG                                            UPDATED  UPDATING  DEGRADED  MACHINECOUNT  READYMACHINECOUNT  UPDATEDMACHINECOUNT  DEGRADEDMACHINECOUNT  AGE
master  rendered-master-ebb1da47ad5cb76c396983decb7df1ea  True     False     False     3             3                  3                    0                     3h41m
worker  rendered-worker-26ccb35941236935a570dddaa0b699db  False    True      True      3             2                  2                    1                     3h41m

$ oc get co
NAME                                      VERSION      AVAILABLE  PROGRESSING  DEGRADED  SINCE
authentication                            4.14.0-rc.1  True       False        False     2h21m
baremetal                                 4.14.0-rc.1  True       False        False     3h38m
cloud-controller-manager                  4.14.0-rc.1  True       False        False     3h41m
cloud-credential                          4.14.0-rc.1  True       False        False     2h23m
cluster-autoscaler                        4.14.0-rc.1  True       False        False     2h21m
config-operator                           4.14.0-rc.1  True       False        False     3h40m
console                                   4.14.0-rc.1  True       False        False     2h20m
control-plane-machine-set                 4.14.0-rc.1  True       False        False     3h40m
csi-snapshot-controller                   4.14.0-rc.1  True       False        False     2h21m
dns                                       4.13.13      True       True         True      2h9m
etcd                                      4.14.0-rc.1  True       False        False     2h40m
image-registry                            4.14.0-rc.1  True       False        False     2h9m
ingress                                   4.14.0-rc.1  True       True         True      1h14m
insights                                  4.14.0-rc.1  True       False        False     3h34m
kube-apiserver                            4.14.0-rc.1  True       False        False     2h35m
kube-controller-manager                   4.14.0-rc.1  True       False        False     2h30m
kube-scheduler                            4.14.0-rc.1  True       False        False     2h29m
kube-storage-version-migrator             4.14.0-rc.1  False      True         False     2h9m
machine-api                               4.14.0-rc.1  True       False        False     2h24m
machine-approver                          4.14.0-rc.1  True       False        False     3h40m
machine-config                            4.13.13      True       False        True      59m
marketplace                               4.14.0-rc.1  True       False        False     3h40m
monitoring                                4.14.0-rc.1  False      True         True      2h3m
network                                   4.13.13      True       True         True      2h4m
node-tuning                               4.14.0-rc.1  True       False        False     2h9m
openshift-apiserver                       4.14.0-rc.1  True       False        False     2h20m
openshift-controller-manager              4.14.0-rc.1  True       False        False     2h20m
openshift-samples                         4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager                4.14.0-rc.1  True       False        False     2h23m
operator-lifecycle-manager-catalog        4.14.0-rc.1  True       False        False     2h18m
operator-lifecycle-manager-packageserver  4.14.0-rc.1  True       False        False     2h20m
service-ca                                4.14.0-rc.1  True       False        False     2h23m
storage                                   4.14.0-rc.1  True       False        False     3h40m

Some OVN pods are running 7 out 8 containers (missing ovnkube-node) constantly restarting, and pods running only 5 containers that show errors to connect to the OVN DBs.

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                    READY  STATUS   RESTARTS  AGE    IP             NODE
ovnkube-control-plane-5f5c598768-czkjv  2/2    Running  0         2h16m  192.168.16.32  dciokd-master-1
ovnkube-control-plane-5f5c598768-kg69r  2/2    Running  0         2h16m  192.168.16.31  dciokd-master-0
ovnkube-control-plane-5f5c598768-prfb5  2/2    Running  0         2h16m  192.168.16.33  dciokd-master-2
ovnkube-node-9hjv9                      5/5    Running  1         3h43m  192.168.16.32  dciokd-master-1
ovnkube-node-fmswc                      7/8    Running  19        2h10m  192.168.16.36  dciokd-worker-2
ovnkube-node-pcjhp                      7/8    Running  20        2h15m  192.168.16.35  dciokd-worker-1
ovnkube-node-q7kcj                      5/5    Running  1         3h43m  192.168.16.33  dciokd-master-2
ovnkube-node-qsngm                      5/5    Running  3         3h27m  192.168.16.34  dciokd-worker-0
ovnkube-node-v2d4h                      7/8    Running  20        2h15m  192.168.16.31  dciokd-master-0

$ oc logs ovnkube-node-9hjv9 -c ovnkube-node -n openshift-ovn-kubernetes | less
...
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112660    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Northbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.112699529Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.112699529Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1)
2023-09-19T03:40:23.112699529Z E0919 03:40:23.112677    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 cluster/status OVN_Northbound' failed: exit status 1
2023-09-19T03:40:23.114791313Z E0919 03:40:23.114777    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_NORTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnnb_db.ctl
2023-09-19T03:40:23.114791313Z ovn-appctl: cannot connect to "/var/run/ovn/ovnnb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.114791313Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnnb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116478    5883 ovn_db.go:511] Failed to retrieve cluster/status info for database "OVN_Southbound", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.116492808Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.116492808Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1)
2023-09-19T03:40:23.116492808Z E0919 03:40:23.116488    5883 ovn_db.go:590] OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 cluster/status OVN_Southbound' failed: exit status 1
2023-09-19T03:40:23.118468064Z E0919 03:40:23.118450    5883 ovn_db.go:283] Failed retrieving memory/show output for "OVN_SOUTHBOUND", stderr: 2023-09-19T03:40:23Z|00001|unixctl|WARN|failed to connect to /var/run/ovn/ovnsb_db.ctl
2023-09-19T03:40:23.118468064Z ovn-appctl: cannot connect to "/var/run/ovn/ovnsb_db.ctl" (No such file or directory)
2023-09-19T03:40:23.118468064Z , err: (OVN command '/usr/bin/ovn-appctl -t /var/run/ovn/ovnsb_db.ctl --timeout=5 memory/show' failed: exit status 1)
2023-09-19T03:40:25.118085671Z E0919 03:40:25.118056    5883 ovn_northd.go:128] Failed to get ovn-northd status stderr() :(failed to run the command since failed to get ovn-northd's pid: open /var/run/ovn/ovn-northd.pid: no such file or directory)

https://github.com/openshift/cluster-network-operator/pull/2035

Bug OCPBUGS-20860: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7887

Bug OCPBUGS-1062: "remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

View the Description View the linked PRs

Description of problem:

4.12.0-0.nightly-2022-09-08-114806 AWS cluster, "remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs, should be a regression issue, no such issue in 4.11 and the defect does not block the function, it seems it's from AWS

$ oc -n openshift-monitoring get pod | grep prometheus-operator-admission-webhook
prometheus-operator-admission-webhook-7d8fd8b5bb-kjh4f   1/1     Running   0          3h
prometheus-operator-admission-webhook-7d8fd8b5bb-whl5n   1/1     Running   0          3h

$ oc -n openshift-monitoring logs prometheus-operator-admission-webhook-7d8fd8b5bb-kjh4f
level=info ts=2022-09-08T23:32:53.782445094Z caller=main.go:130 address=[::]:8443 msg="Starting TLS enabled server"
ts=2022-09-08T23:33:09.057366056Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52820: remote error: tls: bad certificate"
ts=2022-09-08T23:33:10.071639453Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52830: remote error: tls: bad certificate"
ts=2022-09-08T23:33:12.07959313Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52842: remote error: tls: bad certificate"
ts=2022-09-08T23:33:31.729332249Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39188: remote error: tls: bad certificate"
ts=2022-09-08T23:33:32.7374936Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39196: remote error: tls: bad certificate"
ts=2022-09-08T23:33:34.745945871Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39206: remote error: tls: bad certificate"
ts=2022-09-08T23:33:57.460069283Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37500: remote error: tls: bad certificate"
ts=2022-09-08T23:33:58.469984958Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37508: remote error: tls: bad certificate"
ts=2022-09-08T23:34:00.479578826Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:40948: remote error: tls: bad certificate"
ts=2022-09-08T23:36:22.861562723Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53866: remote error: tls: bad certificate"
ts=2022-09-08T23:36:24.870186206Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53882: remote error: tls: bad certificate"
ts=2022-09-08T23:39:43.613375962Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38780: remote error: tls: bad certificate"
ts=2022-09-08T23:39:45.621205524Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38792: remote error: tls: bad certificate"
ts=2022-09-08T23:46:03.653578785Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57878: remote error: tls: bad certificate"
ts=2022-09-08T23:46:05.662237056Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57890: remote error: tls: bad certificate"
ts=2022-09-08T23:49:08.643599472Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:48340: remote error: tls: bad certificate"
ts=2022-09-08T23:52:08.809838473Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:51682: remote error: tls: bad certificate"
ts=2022-09-08T23:52:09.817050146Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:51698: remote error: tls: bad certificate"
ts=2022-09-08T23:55:11.862993344Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54280: remote error: tls: bad certificate"
ts=2022-09-08T23:58:15.820629264Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:59462: remote error: tls: bad certificate"
ts=2022-09-09T00:01:17.913920461Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47320: remote error: tls: bad certificate"
ts=2022-09-09T00:04:21.086495988Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52438: remote error: tls: bad certificate"
ts=2022-09-09T00:07:24.050365477Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55148: remote error: tls: bad certificate"
ts=2022-09-09T00:07:27.066559749Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55168: remote error: tls: bad certificate"
ts=2022-09-09T00:10:28.193017562Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42222: remote error: tls: bad certificate"
ts=2022-09-09T00:10:30.201598245Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:59802: remote error: tls: bad certificate"
ts=2022-09-09T00:13:30.282592276Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45648: remote error: tls: bad certificate"
ts=2022-09-09T00:13:31.290450933Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45654: remote error: tls: bad certificate"
ts=2022-09-09T00:13:33.298604517Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45668: remote error: tls: bad certificate"
ts=2022-09-09T00:16:33.274732648Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56710: remote error: tls: bad certificate"
ts=2022-09-09T00:19:39.47117325Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54978: remote error: tls: bad certificate"
ts=2022-09-09T00:25:43.708275724Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54638: remote error: tls: bad certificate"
ts=2022-09-09T00:28:46.627225713Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:58124: remote error: tls: bad certificate"
ts=2022-09-09T00:28:48.63515681Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39454: remote error: tls: bad certificate"
ts=2022-09-09T00:31:51.728153893Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56894: remote error: tls: bad certificate"
ts=2022-09-09T00:34:52.775067246Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34884: remote error: tls: bad certificate"
ts=2022-09-09T00:41:00.843743907Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:41784: remote error: tls: bad certificate"
ts=2022-09-09T00:44:00.933970145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:36150: remote error: tls: bad certificate"
ts=2022-09-09T00:44:03.949135311Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:36166: remote error: tls: bad certificate"
ts=2022-09-09T00:47:03.97630552Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44732: remote error: tls: bad certificate"
ts=2022-09-09T00:47:06.991580657Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44748: remote error: tls: bad certificate"
ts=2022-09-09T00:50:08.31637565Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54092: remote error: tls: bad certificate"
ts=2022-09-09T00:53:11.264559449Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:43144: remote error: tls: bad certificate"
ts=2022-09-09T00:59:16.306282415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39864: remote error: tls: bad certificate"
ts=2022-09-09T00:59:17.314074479Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39878: remote error: tls: bad certificate"
ts=2022-09-09T00:59:19.32313415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56104: remote error: tls: bad certificate"
ts=2022-09-09T01:08:25.613927992Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44280: remote error: tls: bad certificate"
ts=2022-09-09T01:08:26.622625145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:44290: remote error: tls: bad certificate"
ts=2022-09-09T01:08:28.631034721Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:48838: remote error: tls: bad certificate"
ts=2022-09-09T01:11:28.704732265Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37372: remote error: tls: bad certificate"
ts=2022-09-09T01:11:31.723552093Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37392: remote error: tls: bad certificate"
ts=2022-09-09T01:17:34.794690109Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46750: remote error: tls: bad certificate"
ts=2022-09-09T01:17:35.803918438Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46752: remote error: tls: bad certificate"
ts=2022-09-09T01:17:37.812700046Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46768: remote error: tls: bad certificate"
ts=2022-09-09T01:20:38.79326772Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:53880: remote error: tls: bad certificate"
ts=2022-09-09T01:23:41.073187846Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46086: remote error: tls: bad certificate"
ts=2022-09-09T01:23:44.088529273Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46090: remote error: tls: bad certificate"
ts=2022-09-09T01:26:44.077154097Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54234: remote error: tls: bad certificate"
ts=2022-09-09T01:26:45.085277729Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54248: remote error: tls: bad certificate"
ts=2022-09-09T01:26:47.092797767Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54254: remote error: tls: bad certificate"
ts=2022-09-09T01:29:48.255127155Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39536: remote error: tls: bad certificate"
ts=2022-09-09T01:29:50.263225272Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56030: remote error: tls: bad certificate"
ts=2022-09-09T01:32:51.618334928Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42836: remote error: tls: bad certificate"
ts=2022-09-09T01:32:53.627565113Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42844: remote error: tls: bad certificate"
ts=2022-09-09T01:35:56.945306145Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:57828: remote error: tls: bad certificate"
ts=2022-09-09T01:38:57.721110974Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:54038: remote error: tls: bad certificate"
ts=2022-09-09T01:41:59.901865996Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46096: remote error: tls: bad certificate"
ts=2022-09-09T01:42:00.903596845Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:46102: remote error: tls: bad certificate"
ts=2022-09-09T01:45:03.034044637Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55868: remote error: tls: bad certificate"
ts=2022-09-09T01:45:04.042270514Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55874: remote error: tls: bad certificate"
ts=2022-09-09T01:45:06.05067642Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55888: remote error: tls: bad certificate"
ts=2022-09-09T01:48:06.178001976Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:56024: remote error: tls: bad certificate"
ts=2022-09-09T01:48:09.192075072Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37562: remote error: tls: bad certificate"
ts=2022-09-09T01:51:10.203900665Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:33016: remote error: tls: bad certificate"
ts=2022-09-09T01:51:12.212458619Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:33022: remote error: tls: bad certificate"
ts=2022-09-09T01:54:13.294550312Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:38042: remote error: tls: bad certificate"
ts=2022-09-09T01:57:15.292731466Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:43838: remote error: tls: bad certificate"
ts=2022-09-09T02:00:19.408152102Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42838: remote error: tls: bad certificate"
ts=2022-09-09T02:00:21.41717724Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42842: remote error: tls: bad certificate"
ts=2022-09-09T02:03:21.342937844Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55026: remote error: tls: bad certificate"
ts=2022-09-09T02:03:22.350450637Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:55034: remote error: tls: bad certificate"
ts=2022-09-09T02:06:25.421123942Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34882: remote error: tls: bad certificate"
ts=2022-09-09T02:06:27.428721002Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:34884: remote error: tls: bad certificate"
ts=2022-09-09T02:09:28.541378288Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:52888: remote error: tls: bad certificate"
ts=2022-09-09T02:12:31.610427648Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47430: remote error: tls: bad certificate"
ts=2022-09-09T02:12:33.618581498Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:47434: remote error: tls: bad certificate"
ts=2022-09-09T02:15:33.601606956Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37706: remote error: tls: bad certificate"
ts=2022-09-09T02:15:36.617807944Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:37730: remote error: tls: bad certificate"
ts=2022-09-09T02:18:37.815046583Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:45066: remote error: tls: bad certificate"
ts=2022-09-09T02:18:39.822858743Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:39614: remote error: tls: bad certificate"
ts=2022-09-09T02:21:40.885368415Z caller=stdlib.go:105 caller=server.go:3195 msg="http: TLS handshake error from 10.128.0.9:42250: remote error: tls: bad certificate"

Version-Release number of selected component (if applicable):

"remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

How reproducible:

always

Steps to Reproduce:

1. check prometheus-operator-admission-webhook logs.

Actual results:

"remote error: tls: bad certificate" is in prometheus-operator-admission-webhook logs

Expected results:

no error logs

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/2065

Bug OCPBUGS-8694: Install failed with External platform type

View the Description View the linked PRs

Description of problem:

Install failed with External platform type

Version-Release number of selected component (if applicable):

4.14.0-0.ci-2023-03-07-170635
as there is no available 4.14 nightly build, so use the ci build

How reproducible:

Always

Steps to Reproduce:

1.Set up a UPI vsphere cluster with platform set to External

2.Install failed

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion               
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          141m    Unable to apply 4.14.0-0.ci-2023-03-07-170635: the cluster operator cloud-controller-manager is not available
liuhuali@Lius-MacBook-Pro huali-test % oc get co                           
NAME                                       VERSION                         AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.14.0-0.ci-2023-03-07-170635   True        False         False      118m    
baremetal                                  4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
cloud-controller-manager                   4.14.0-0.ci-2023-03-07-170635                                                
cloud-credential                           4.14.0-0.ci-2023-03-07-170635   True        False         False      140m    
cluster-autoscaler                         4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
config-operator                            4.14.0-0.ci-2023-03-07-170635   True        False         False      139m    
console                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      124m    
control-plane-machine-set                  4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
csi-snapshot-controller                    4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
dns                                        4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
etcd                                       4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
image-registry                             4.14.0-0.ci-2023-03-07-170635   True        False         False      127m    
ingress                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      126m    
insights                                   4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
kube-apiserver                             4.14.0-0.ci-2023-03-07-170635   True        False         False      134m    
kube-controller-manager                    4.14.0-0.ci-2023-03-07-170635   True        False         False      136m    
kube-scheduler                             4.14.0-0.ci-2023-03-07-170635   True        False         False      135m    
kube-storage-version-migrator              4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
machine-api                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
machine-approver                           4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
machine-config                             4.14.0-0.ci-2023-03-07-170635   True        False         False      136m    
marketplace                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
monitoring                                 4.14.0-0.ci-2023-03-07-170635   True        False         False      124m    
network                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      139m    
node-tuning                                4.14.0-0.ci-2023-03-07-170635   True        False         False      137m    
openshift-apiserver                        4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
openshift-controller-manager               4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
openshift-samples                          4.14.0-0.ci-2023-03-07-170635   True        False         False      131m    
operator-lifecycle-manager                 4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
operator-lifecycle-manager-catalog         4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
operator-lifecycle-manager-packageserver   4.14.0-0.ci-2023-03-07-170635   True        False         False      132m    
service-ca                                 4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
storage                                    4.14.0-0.ci-2023-03-07-170635   True        False         False      138m    
liuhuali@Lius-MacBook-Pro huali-test % oc get infrastructure cluster -oyaml
apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-03-08T07:46:07Z"
  generation: 1
  name: cluster
  resourceVersion: "527"
  uid: 096a54bc-8a35-4071-b750-cfac439c1916
spec:
  cloudConfig:
    name: ""
  platformSpec:
    external:
      platformName: vSphere
    type: External
status:
  apiServerInternalURI: https://api-int.huliu-vs8x.qe.devcluster.openshift.com:6443
  apiServerURL: https://api.huliu-vs8x.qe.devcluster.openshift.com:6443
  controlPlaneTopology: HighlyAvailable
  etcdDiscoveryDomain: ""
  infrastructureName: huliu-vs8x-fk79b
  infrastructureTopology: HighlyAvailable
  platform: External
  platformStatus:
    external: {}
    type: External
liuhuali@Lius-MacBook-Pro huali-test %

Actual results:

Install failed. the cluster operator cloud-controller-manager is not available

Expected results:

Install successfully

Additional info:

This if for testing https://issues.redhat.com/browse/OCPCLOUD-1772

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/233

Bug OCPBUGS-14015: Create helm release page doesn't show a YAML editor when schema isn't available (httpd-imagestreams chart)

View the Description View the linked PRs

Description of problem:
When try to import the Helm chart "httpd-imagestreams" the "Create Helm Release" page shows a info alert that the form isn't avaiable because there isn't a schema for this helm chart. But the YAML view is also not visible.

Info Alert:

Form view is disabled for this chart because the schema is not available

Version-Release number of selected component (if applicable):
4.9-4.14 (current master)

How reproducible:
Always

Steps to Reproduce:

Switch to the developer perspective
Navigate to Add > Helm Chart
Search and select "httpd-imagestreams", click the card and then Create to open the "Create Helm Release" page

Actual results:

Form / YAML switch is disabled
Info alert is shown: Form view is disabled for this chart because the schema is not available
There is no YAML editor

Expected results:

It's fine that the Form/ YAML switch is disabled
Info alert is also fine
YAML editor should be displayed

Additional info:
The chart yaml is available here and doesn't contain a schema (at the moment).

https://github.com/openshift-helm-charts/charts/blob/main/charts/redhat/redhat/httpd-imagestreams/0.0.1/src/Chart.yaml

https://github.com/openshift/console/pull/12914

Bug OCPBUGS-14890: Missing 'View details' link for several servicemonitors.spec.endpoints fields in YAML sidebar

View the Description View the linked PRs

Description of problem:

when viewing servicemonitor schema in YAML sidebar, for many fields whose type is Object, console doesn't have a 'View details' button to show more details

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-12-044657

How reproducible:

Always

Steps to Reproduce:

1. goes to any ServiceMonitor yaml page, open Schema by clicking on 'View sidebar'
click 'View details' of 'spec' -> click 'View details' of 'endpoints'
2. Check object and array type schema
spec.endpoints.authorization
spec.endpoints.basicAuth
spec.endpoints.bearerTokenSecret
spec.endpoints.oauth2
spec.endpoints.params
spec.endpoints.tlsConfig
spec.endpoints.relabelings

Actual results:

2. there is no 'View details' button for these 'object' and 'array' type field

Expected results:

2. we should provide 'View details' link for 'object' and 'array' fields so that user has ability to view more details

For example
$ oc explain servicemonitors.spec.endpoints.tlsConfig
KIND: ServiceMonitor
VERSION: monitoring.coreos.com/v1RESOURCE: tlsConfig <Object>DESCRIPTION:
TLS configuration to use when scraping the endpointFIELDS:
ca <Object>
Certificate authority used when verifying server certificates. caFile <string>
Path to the CA cert in the Prometheus container to use for the targets. cert <Object>
Client certificate to present when doing client-authentication. certFile <string>
Path to the client cert file in the Prometheus container for the targets. insecureSkipVerify <boolean>
Disable target certificate validation. keyFile <string>
Path to the client key file in the Prometheus container for the targets. keySecret <Object>
Secret containing the client key file for the targets. serverName <string>
Used to verify the hostname for the targets.

oc explain servicemonitors.spec.endpoints.relabelings
KIND: ServiceMonitor
VERSION: monitoring.coreos.com/v1RESOURCE: relabelings <[]Object>DESCRIPTION:
RelabelConfigs to apply to samples before scraping. Prometheus Operator
automatically adds relabelings for a few standard Kubernetes fields. The
original scrape job's name is available via the `__tmp_prometheus_job_name`
label. More info:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config RelabelConfig allows dynamic rewriting of the label set, being applied to
samples before ingestion. It defines `<metric_relabel_configs>`-section of
Prometheus configuration. More info:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#metric_relabel_configsFIELDS:
action <string>
Action to perform based on regex matching. Default is 'replace'. uppercase
and lowercase actions require Prometheus >= 2.36. modulus <integer>
Modulus to take of the hash of the source label values. regex <string>
Regular expression against which the extracted value is matched. Default is
'(.*)' replacement <string>
Replacement value against which a regex replace is performed if the regular
expression matches. Regex capture groups are available. Default is '$1' separator <string>
Separator placed between concatenated source label values. default is ';'. sourceLabels <[]string>
The source labels select values from existing labels. Their content is
concatenated using the configured separator and matched against the
configured regular expression for the replace, keep, and drop actions. targetLabel <string>
Label to which the resulting value is written in a replace action. It is
mandatory for replace actions. Regex capture groups are available.

Additional info:

https://github.com/openshift/console/pull/12895

Bug OCPBUGS-6016: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/468

Bug OCPBUGS-13250: Disk "name" in BMH HardwareDetails is incorrect

View the Description View the linked PRs

After installation with the assisted installer, the cluster contains BareMetalHost CRs (in the 'unmanaged' state) generated by assisted. These CRs include HardwareDetails data captured from the assisted-installer-agent.
Likely due to misleading documentation in Metal³ (since fixed by https://github.com/metal3-io/baremetal-operator/pull/657), the name field of storage devices is set to a name like sda instead of what Metal³'s own inspection would set it to, which is /dev/sda. This field is meant to be round-trippable to the rootDeviceHints, and as things stand it is not.

https://github.com/openshift/assisted-service/pull/5193

Bug OCPBUGS-1769: gracefully fail when iam:GetRole is denied

View the Description View the linked PRs

Description of problem:

Installer as used with AWS, during a cluster destroy, does a get-all-roles and would delete roles based on a tag. If a customer is using AWS SEA which would deny any roles doing a get-all-roles in the AWS account, the installer fails.

Instead of error-out, the installer should gracefully handle being denied get-all-roles and move onward, so that a denying SCP would not get in the way of a successful cluster destroy on AWS.

Version-Release number of selected component (if applicable):

[ec2-user@ip-172-16-32-144 ~]$ rosa version
1.2.6

How reproducible:

1. Deploy ROSA STS, private with PrivateLink with AWS SEA
2. rosa delete cluster --debug
3. watch the debug logs of the installer to see it try to get-all-roles
4. installer fails when the SCP from AWS SEA denies the get-all-roles task

Steps to Reproduce: Philip Thomson Would you please fill out the below?

Steps list above.

Actual results:

time="2022-09-01T00:10:40Z" level=error msg="error after waiting for command completion" error="exit status 4" installID=zp56pxql
time="2022-09-01T00:10:40Z" level=error msg="error provisioning cluster" error="exit status 4" installID=zp56pxql
time="2022-09-01T00:10:40Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 4" installID=zp56pxql


time="2022-09-01T00:12:47Z" level=info msg="copied /installconfig/install-config.yaml to /output/install-config.yaml" installID=55h2cvl5
time="2022-09-01T00:12:47Z" level=info msg="cleaning up resources from previous provision attempt" installID=55h2cvl5
time="2022-09-01T00:12:47Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:48Z" level=debug msg="search for matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:48Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:12:49Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6b4b5144-2f4e-4fde-ba1a-04ed239b84c2" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6152e9c2-9c1c-478b-a5e3-11ff2508684e" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 8636f0ff-e984-4f02-870e-52170ab4e7bb" installID=55h2cvl5
time="2022-09-01T00:12:52Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 2385a980-dc9b-480f-955a-62ac1aaa6718" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 02ccef62-14e7-4310-b254-a0731995bd45" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: eca2081d-abd7-4c9b-b531-27ca8758f933" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6bda17e9-83e5-4688-86a0-2f84c77db759" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 827afa4a-8bb9-4e1e-af69-d5e8d125003a" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 8dcd0480-6f9e-49cb-a0dd-0c5f76107696" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 5095aed7-45de-4ca0-8c41-9db9e78ca5a6" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 04f7d0e0-4139-4f74-8f67-8d8a8a41d6b9" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 115f9514-b78b-42d1-b008-dc3181b61d33" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 68da4d93-a93e-410a-b3af-961122fe8df0" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 012221ea-2121-4b04-91f2-26c31c8458b1" installID=55h2cvl5
time="2022-09-01T00:12:53Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: e6c9328d-a4b9-4e69-8194-a68ed7af6c73" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 214ca7fb-d153-4d0d-9f9c-21b073c5bd35" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: 63b54e82-e2f6-48d4-bd0f-d2663bbc58bf" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: d24982b6-df65-4ba2-a3c0-5ac8d23947e1" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: e2c5737a-5014-4eb5-9150-1dd1939137c0" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7793fa7c-4c8d-4f9f-8f23-d393b85be97c" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: bef2c5ab-ef59-4be6-bf1a-2d89fddb90f1" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: ff04eb1b-9cf6-4fff-a503-d9292ff17ccd" installID=55h2cvl5
time="2022-09-01T00:12:54Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: 85e05de8-ba16-4366-bc86-721da651d770" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a9d864e4-cfdf-483d-a0d2-9b48a117abc4" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for IAM users" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="iterating over a page of 0 IAM users" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for IAM instance profiles" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=info msg="error while finding resources to delete" error="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a9d864e4-cfdf-483d-a0d2-9b48a117abc4" installID=55h2cvl5
time="2022-09-01T00:12:56Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:12:57Z" level=info msg=Disassociated id=i-03d7570547d32071d installID=55h2cvl5 name=rosa-mv9dx3-xls7g-master-profile role=ROSA-ControlPlane-Role
time="2022-09-01T00:12:57Z" level=info msg=Deleted InstanceProfileName=rosa-mv9dx3-xls7g-master-profile arn="arn:aws:iam::646284873784:instance-profile/rosa-mv9dx3-xls7g-master-profile" id=i-03d7570547d32071d installID=55h2cvl5
time="2022-09-01T00:12:57Z" level=debug msg=Terminating id=i-03d7570547d32071d installID=55h2cvl5
time="2022-09-01T00:12:58Z" level=debug msg=Terminating id=i-08bee3857e5265ba4 installID=55h2cvl5
time="2022-09-01T00:12:58Z" level=debug msg=Terminating id=i-00df6e7b34aa65c9b installID=55h2cvl5
time="2022-09-01T00:13:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:13:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:14:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:15:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:16:58Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:08Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:18Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:28Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:38Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:48Z" level=debug msg="search for instances by tag matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:49Z" level=info msg=Deleted id=rosa-mv9dx3-xls7g-sint/2e99b98b94304d80 installID=55h2cvl5
time="2022-09-01T00:17:49Z" level=info msg=Deleted id=eni-0e4ee5cf8f9a8fdd2 installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="Revoked ingress permissions" id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="Revoked egress permissions" id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:17:50Z" level=debug msg="DependencyViolation: resource sg-03265ad2fae661b8c has a dependent object\n\tstatus code: 400, request id: f7c35709-a23d-49fd-ac6a-f092661f6966" arn="arn:aws:ec2:ca-central-1:646284873784:security-group/sg-03265ad2fae661b8c" installID=55h2cvl5
time="2022-09-01T00:17:51Z" level=info msg=Deleted id=eni-0e592a2768c157360 installID=55h2cvl5
time="2022-09-01T00:17:52Z" level=debug msg="listing AWS hosted zones \"rosa-mv9dx3.0ffs.p1.openshiftapps.com.\" (page 0)" id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:52Z" level=debug msg="listing AWS hosted zones \"0ffs.p1.openshiftapps.com.\" (page 0)" id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=info msg=Deleted id=Z072427539WBI718F6BCC installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=debug msg="Revoked ingress permissions" id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:53Z" level=debug msg="Revoked egress permissions" id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=sg-08bfbb32ea92f583e installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=rosa-mv9dx3-xls7g-aint/635162452c08e059 installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=info msg=Deleted id=eni-049f0174866d87270 installID=55h2cvl5
time="2022-09-01T00:17:54Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="search for matching resources by tag in us-east-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="no deletions from us-east-1, removing client" installID=55h2cvl5
time="2022-09-01T00:17:55Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 06b804ae-160c-4fa7-92de-fd69adc07db2" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 2a5dd4ad-9c3e-40ee-b478-73c79671d744" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: e61daee8-6d2c-4707-b4c9-c4fdd6b5091c" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 1b743447-a778-4f9e-8b48-5923fd5c14ce" installID=55h2cvl5
time="2022-09-01T00:17:56Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: da8c8a42-8e79-48e5-b548-c604cb10d6f4" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7d7840e4-a1b4-4ea2-bb83-9ee55882de54" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7f2e04ed-8c49-42e4-b35e-563093a57e5b" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: cd2b4962-e610-4cc4-92bc-827fe7a49b48" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: be005a09-f62c-4894-8c82-70c375d379a9" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 541d92f4-33ce-4a50-93d8-dcfd2306eeb0" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 6dd81743-94c4-479a-b945-ffb1af763007" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a269f47b-97bc-4609-b124-d1ef5d997a91" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 33c3c0a5-e5c9-4125-9400-aafb363c683c" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 32e87471-6d21-42a7-bfd8-d5323856f94d" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: b2cc6745-0217-44fe-a48b-44e56e889c9e" installID=55h2cvl5
time="2022-09-01T00:17:57Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 09f81582-6685-4dc9-99f0-ed33565ab4f4" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: cea9116c-2b54-4caa-9776-83559d27b8f8" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: 430d7750-c538-42a5-84b5-52bc77ce2d56" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 279038e4-f3c9-4700-b590-9a90f9b8d3a2" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: 5e2f40ae-3dc7-4773-a5cd-40bf9aa36c03" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: 92a27a7b-14f5-455b-aa39-3c995806b83e" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0da4f66c-c6b1-453c-a8c8-dc0399b24bb9" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: f2c94beb-a222-4bad-abe1-8de5786f5e59" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 829c3569-b2f2-4b9d-94a0-69644b690066" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="search for IAM users" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="iterating over a page of 0 IAM users" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=debug msg="search for IAM instance profiles" installID=55h2cvl5
time="2022-09-01T00:17:58Z" level=info msg="error while finding resources to delete" error="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 829c3569-b2f2-4b9d-94a0-69644b690066" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=info msg=Deleted id=sg-03265ad2fae661b8c installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="search for matching resources by tag in ca-central-1 matching aws.Filter{\"kubernetes.io/cluster/rosa-mv9dx3-xls7g\":\"owned\"}" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="no deletions from ca-central-1, removing client" installID=55h2cvl5
time="2022-09-01T00:18:09Z" level=debug msg="search for IAM roles" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="iterating over a page of 64 IAM roles" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-ConfigRecorderRole-B749E1E6: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-ConfigRecorderRole-B749E1E6 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0e8e0bea-b512-469b-a996-8722a0f7fa25" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-CWL-Add-Subscription-Filter-9D3CF73C with an explicit deny in a service control policy\n\tstatus code: 403, request id: 288456a2-0cd5-46f1-a5d2-6b4006a5dc0e" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-S4CHZ22EC1B2 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 321df940-70fc-45e7-8c56-59fe5b89e84f" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-AWS679f53fac002430cb0da5-X9UQK0CYNPPO with an explicit deny in a service control policy\n\tstatus code: 403, request id: 45bebf36-8bf9-4c78-a80f-c6a5e98b2187" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCentralEndpointDep-1H6K6CZ6AEUBO with an explicit deny in a service control policy\n\tstatus code: 403, request id: eea00ae2-1a72-43f9-9459-a1c003194137" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomCreateSSMDocument7-1JDO2BN7QTXRH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 0ef5a102-b764-4e17-999f-d820ebc1ec12" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEBSDefaultEncrypti-19EVAXFRG2BEJ with an explicit deny in a service control policy\n\tstatus code: 403, request id: 107d0ccf-94e7-41c4-96cd-450b66a84101" installID=55h2cvl5
time="2022-09-01T00:18:10Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomEc2OperationsB1799-1WASK5J6GUYHO with an explicit deny in a service control policy\n\tstatus code: 403, request id: da9bd868-8384-4072-9fb4-e6a66e94d2a1" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGetDetectorIdRole6-9VGPM8U0HMV7 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 74fbf44c-d02d-4072-b038-fa456246b6a8" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomGuardDutyCreatePub-1W03UREYK3KTX with an explicit deny in a service control policy\n\tstatus code: 403, request id: 365116d6-1467-49c3-8f58-1bc005aa251f" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMCreateRoleE62B6-1AQL8IBN9938I with an explicit deny in a service control policy\n\tstatus code: 403, request id: 20f91de5-cfeb-45e0-bb46-7b66d62cc749" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomIAMPasswordPolicyC-16TPLHRY1FZ43 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 924fa288-f1b9-49b8-b549-a930f6f771ce" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsLogGroup49AC86-1D03LOLE2CARP with an explicit deny in a service control policy\n\tstatus code: 403, request id: 4beb233d-40d6-4016-872a-8757af8f98ee" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomLogsMetricFilter7F-DLA5E1PZSFHH with an explicit deny in a service control policy\n\tstatus code: 403, request id: 77951f62-e0b4-4a9b-a20c-ea40d6432e84" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieExportConfigR-1QT1WNNWPSL36 with an explicit deny in a service control policy\n\tstatus code: 403, request id: 13ad38c8-89dc-461d-9763-870eec3a6ba1" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomMacieUpdateSession-1NHBPTB4GOSM8 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a8fe199d-12fb-4141-a944-c7c5516daf25" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomResourceCleanupC59-1MSCB57N479UU with an explicit deny in a service control policy\n\tstatus code: 403, request id: b487c62f-5ac5-4fa0-b835-f70838b1d178" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomS3PutReplicationRo-FE5Q26BTAG9K with an explicit deny in a service control policy\n\tstatus code: 403, request id: 97bfcb55-ae1f-4859-9c12-03de09607f79" installID=55h2cvl5
time="2022-09-01T00:18:11Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSecurityHubRole660-1UX115B9Q68WX with an explicit deny in a service control policy\n\tstatus code: 403, request id: ca1094f6-714e-4042-9134-75f4c6d9d0df" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomSSMUpdateRoleD3D5C-AZ9GBJG6UM4F with an explicit deny in a service control policy\n\tstatus code: 403, request id: ca1db477-ee6a-4d03-8b57-52b335b2bbe6" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-CustomVpcDefaultSecurity-HC931RYMVKKC with an explicit deny in a service control policy\n\tstatus code: 403, request id: 1fc32d09-588b-4d80-ad62-748f7fb55efd" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-Mv9dx3Rosa81Ebf-DefaultBucketReplication-OIM43YBJSMGD with an explicit deny in a service control policy\n\tstatus code: 403, request id: 7d906cc2-eaaa-439b-97e0-503615ce5d43" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=debug msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-PipelineRole: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-PipelineRole with an explicit deny in a service control policy\n\tstatus code: 403, request id: ee6a5647-20b1-4880-932b-bfd70b945077" installID=55h2cvl5
time="2022-09-01T00:18:12Z" level=info msg="get tags for arn:aws:iam::646284873784:role/PBMMAccel-VPC-FlowLog-519F0B57: AccessDenied: User: arn:aws:sts::646284873784:assumed-role/ROSA-Installer-Role/1661991167715690329 is not authorized to perform: iam:GetRole on resource: role PBMMAccel-VPC-FlowLog-519F0B57 with an explicit deny in a service control policy\n\tstatus code: 403, request id: a424891e-48ab-4ad4-9150-9ef1076dcb9c" installID=55h2cvl5

Repeats the not authroized errors probably 50+ times.

Expected results:

For these errors not to show up during install.

Additional info:

Again this is only due to ROSA being install in an AWS SEA environment - https://github.com/aws-samples/aws-secure-environment-accelerator.

https://github.com/openshift/installer/pull/7180

Bug OCPBUGS-19748: CI: MTU migraton failures in 4.14

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18396~~. The following is the description of the original issue:
—
CI is almost perma failing on mtu migration in 4.14 (both SDN and OVN-Kubernetes):

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv4

Looks like the common issue is waiting for MCO times out:

+ echo '[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...'
[2023-08-31T03:58:16+00:00] Waiting for final Machine Controller Config...
+ timeout 900s bash
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO
migration field is not cleaned by MCO 
...

https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/1979/pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4/1697077984948654080/build-log.txt

https://github.com/openshift/cluster-network-operator/pull/2034

Bug OCPBUGS-3680: 4.15: Upgrade blocked: csi-snapshot-controller fails with read-only filesystem

View the Description View the linked PRs

Description of problem:

OCP upgrade blocks because of cluster operator csi-snapshot-controller fails to start its deployment with a fatal message of read-only filesystem

Version-Release number of selected component (if applicable):

Red Hat OpenShift 4.11
rhacs-operator.v3.72.1

How reproducible:

At least once in user's cluster while upgrading

Steps to Reproduce:

1. Have a OCP 4.11 installed
2. Install ACS on top of the OCP cluster
3. Upgrade OCP to the next z-stream version

Actual results:

Upgrade gets blocked: waiting on csi-snapshot-controller

Expected results:

Upgrade should succeed

Additional info:

stackrox SCCs (stackrox-admission-control, stackrox-collector and stackrox-sensor) contain the `readOnlyRootFilesystem` set to `true`, if not explicitly defined/requested, other Pods might receive this SCC which will make the deployment to fail with a `read-only filesystem` message

https://github.com/openshift/cluster-csi-snapshot-controller-operator/pull/154

Bug OCPBUGS-8268: OpenShift pipeline TaskRun(s) column Duration is not present as column in UI

View the Description View the linked PRs

Description of problem:

PipelineRun has Duration column and inside it TaskRun - doesn't

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Have OpenShift Pipeline with 2+ tasks configured and invoked

Steps to Reproduce:

1. Once PipelineRun is invoked - navigate to invoked TaskRuns
2. You will see there columns like Status, Started, but no Duration

Actual results:

Expected results:

Additional info:

I'll add screenshots for PipelineRuns and TaskRuns

https://github.com/openshift/console/pull/12633

Bug OCPBUGS-11036: Nodes in Ironic are created without namespaces initially

View the Description View the linked PRs

Nodes in Ironic are created following pattern <namespace>~<host name>.

However, when creating nodes in Ironic, baremetal-operator first creates them without a namespace, and only prepends the namespace prefix later. This open a possibility of node clashes, especially in the ACM context.

https://github.com/openshift/baremetal-operator/pull/264

Bug OCPBUGS-14915: Admin Web Console has duplicate nav menu entries under "Observe"

View the Description View the linked PRs

Description of problem:

In the web console Administrator view, the items under "Observe" in the side navigation menu are duplicated.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

This is happening because those menu items are now provided by the `monitoring-plugin` dynamic plugin, so we need to remove them from the web console codebase.

https://github.com/openshift/console/pull/12893

Bug OCPBUGS-15723: Hypershift NodePool AllMachinesReady and AllNodesHealthy should have message conditions ordered

View the Description View the linked PRs

Description of problem:


NodePool conditions AllMachinesReady and AllNodesHealthy are used by Cluster Service to detect problems on customer nodes.

Everytime a NodePool is updated, it triggers an update in a ManifestWork that is being processed by CS to build a user message about why a specific machinepool/nodepool is not healthy.

The lack of a sorted message when there are more than one machines creates a bug that the NodePool is updated multiple time, when the state is the same.

For example, CS may capture scenarios like this and consider them like the change is the same.

Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting ,

Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed ,

Machine rosa-vws58-workshop-69b55d58b-mq44p: UnhealthyNode
Machine rosa-vws58-workshop-69b55d58b-97n47: UnhealthyNode ,
Machine rosa-vws58-workshop-69b55d58b-mq44p: NodeConditionsFailed
Machine rosa-vws58-workshop-69b55d58b-97n47: Deleting ,

Expected results:


The HyperShift Operator should sort the messages where multiple machines/nodes are invovled:

https://github.com/openshift/hypershift/blob/86af31a5a5cdee3da0d7f65f3bd550f4ec9cac55/hypershift-operator/controllers/nodepool/nodepool_controller.go#L2509

https://github.com/openshift/hypershift/pull/2766

Bug OCPBUGS-588: minVersion in ImageSetConfiguration seems to be ignored as older images are also pulled

View the Description View the linked PRs

Description of problem:
Older images are pulled even when using minVersion in ImageSetConfiguration.

Version-Release number of selected component (if applicable):
oc mirror version
Client Version: version.Info

{Major:"", Minor:"", GitVersion:"4.11.0-202208031306.p0.g3c1c80c.assembly.stream-3c1c80c", GitCommit:"3c1c80ca6a5a22b5826c88897e7a9e5acd7c1a96", GitTreeState:"clean", BuildDate:"2022-08-03T14:23:35Z", GoVersion:"go1.18.4", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:
Always

Steps to Reproduce:
1. get attached ImageSetConfiguration
2. run 'oc mirror --config=./image-set.yaml docker://<yourRegistry> --continue-on-error'

Actual results:
Output contains a lot of 'unable to retrieve source image' errors for images which are older than defined in minVersion (those images are known to be missing, a goal was to use minVersion to filter out those older images to get rid of those errors but it's not working)

Expected results:
Those older images should not be included

Additional info:
image-set.yaml is attached
Full output of 'oc mirror' attached
There are more images failing but as an example:

error: unable to retrieve source image registry.redhat.io/openshift-service-mesh/pilot-rhel8 manifest sha256:f7c468b5a35bfce54e53b4d8d00438f33a0861549697d14445eae52d8ead9a68: for image pulls. Use the equivalent V2 schema 2 manifest digest instead. For more information see https://access.redhat.com/articles/6138332

This image is from version 1.0.11 but minVersion: '2.2.1-0' so it should not be included.
Here is how I checked that image:

podman inspect registry-proxy.engineering.redhat.com/rh-osbs/openshift-service-mesh-pilot-rhel8@sha256:f7c468b5a35bfce54e53b4d8d00438f33a0861549697d14445eae52d8ead9a68 | grep version
                "istio_version": "1.1.17",
                "version": "1.0.11"
            "istio_version": "1.1.17",
            "version": "1.0.11"

https://github.com/openshift/oc-mirror/pull/603

Bug OCPBUGS-10577: Use flowcontrol/v1beta3 for apf manifests in 4.14

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Task HOSTEDCP-1185: Add flag to hypershift create infra aws command to create a single nat gateway

View the Description View the linked PRs

Allow creating a single NAT gateway for a multi-zone hosted cluster. The route table in other zones should point to the one NAT gateway.

This allows running a cluster in multiple zones with a single NAT gateway which can be expensive to run in AWS.

https://github.com/openshift/hypershift/pull/2984

Bug OCPBUGS-21112: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/51

Bug OCPBUGS-26171: [release-4.14] YAML editor shows different style in console for configmaps with data exceeding 78 Characters

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19546~~. The following is the description of the original issue:
—
Testcases:

1. Create a configmap from a file with 77 characters in a line

File data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI data:

$ oc get cm cm-test4 -o yaml
apiVersion: v1
data:
  cm-test4: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:43Z"
  name: cm-test4
  namespace: configmap-test
  resourceVersion: "8962738"
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367


UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test4
  namespace: configmap-test
  uid: cf0e264b-72fb-4df7-bd3a-f3ed62423367
  resourceVersion: '8962738'
  creationTimestamp: '2022-09-28T12:39:43Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:43Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test4': {}
data:
  cm-test4: |                                                                      ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

2. Create a configmap from a file with characters more than 78 in a line,

File Data:
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

CLI Data:

$ oc get cm cm-test5 -o yaml
apiVersion: v1
data:
  cm-test5: |                                                                              ##Noticed the Literal style
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-28T12:39:54Z"
  name: cm-test5
  namespace: configmap-test
  resourceVersion: "8962813"
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1

UI data:

kind: ConfigMap
apiVersion: v1
metadata:
  name: cm-test5
  namespace: configmap-test
  uid: b8b12653-588a-4afc-8ed9-ff7c6ebaefb1
  resourceVersion: '8962813'
  creationTimestamp: '2022-09-28T12:39:54Z'
  managedFields:
    - manager: kubectl-create
      operation: Update
      apiVersion: v1
      time: '2022-09-28T12:39:54Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:data':
          .: {}
          'f:cm-test5': {}
data:
  cm-test5: >                                                                         ##Noticed the Folded style and newlines in between data
    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

    eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

    ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss

    tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt

Conclusion:

When the CM is created with more than 78 characters in a single line the yaml editor in the web UI changes the style to folded and could see newline in between data.

https://github.com/openshift/console/pull/13482

Bug OCPBUGS-15138: The whereabouts-reconciler daemonset lacks the kubernetes.io/os: linux node selector

View the Description View the linked PRs

Description of problem:

All the DaemonSets defined within the openshift-multus namespace have a node selector predicate on the kubernetes.io/os label to schedule the daemonset's pods only on linux workers. The wherebout-reconciler seems missing it. We might need to add the `kubernetes.io/os: linux` label to stay consistent with the other daemonsets definitions and avoid risks in case of clusters with windows workers.

Version-Release number of selected component (if applicable):

4.13+

How reproducible:

Always

Steps to Reproduce:

1. oc get daemonsets -n openshift-multus
NAME                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
multus                          6         6         6       6            6           kubernetes.io/os=linux   4h1m
multus-additional-cni-plugins   6         6         6       6            6           kubernetes.io/os=linux   4h1m
multus-networkpolicy            6         6         6       6            6           kubernetes.io/os=linux   19s

Actual results:

network-metrics-daemon          6         6         6       6            6           kubernetes.io/os=linux   4h1m whereabouts-reconciler          6         6         6       6            6           <none>                   23s

note the missing kuberentes.io/os nodeselector

Expected results:

The whereabouts-reconciler should also have the nodeselecto term kubernetes.io/os: linux.

Additional info:

https://redhat-internal.slack.com/archives/CFFSAHWHF/p1687158805205059

https://github.com/openshift/cluster-network-operator/pull/1841

Bug OCPBUGS-8335: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5022

Story MGMT-14125: Use systemd unit instead of dracut hook to configure network

View the Description View the linked PRs

Currently the assisted installer adds to the ISO a dracut hook that is executed early during the boot process. That hook generates the NetworkManager configuration files that will be used during the boot and also once the machine is installed. But that hook is not guaranteed to run before NetworkManager, and the files it generates may not be loaded by NetworkManager at the right time. We have seen such issues in the recent upgrade from RHEL 8 to RHEL 9 that is part of OpenShift 4.13. The RCHOS team recommends replacing it with a systemd unit that runs before NetworkManager.

https://github.com/openshift/assisted-service/pull/5107

Bug OCPBUGS-21032: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-snapshotter/pull/109

Task ODC-7318: Update ODC owners

View the Description View the linked PRs

Add Vikram to frontend approvers
Remove Andrew from that list

https://github.com/openshift/console/pull/12839

Bug OCPBUGS-16501: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-authentication-operator/pull/624

Bug OCPBUGS-17869: [Azure] Gate NAT gateway feature behind TechPreview

View the Description View the linked PRs

Description of problem:

NAT gateway is not yet a supported feature and the current implementation is a partial non-zonal solution.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

always

Steps to Reproduce:

1. Set OutboundType = NatGateway
2. Deploy cluster
3.

Actual results:

Install successful

Expected results:

Install requires TechPreviewNoUpgrade before proceeding

Additional info:

Bug OCPBUGS-21177: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-attacher/pull/60

Bug OCPBUGS-18266: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2959

Bug OCPBUGS-14095: multus mac-vlan/ipvlan/vlan cni panics when master interface in container is missing

View the Description View the linked PRs

Description of problem:

Multus mac-vlan/ipvlan/vlan cni panics when master interface in container is missing

Version-Release number of selected component (if applicable):

metallb-operator.v4.13.0-202304190216   MetalLB Operator   4.13.0-202304190216 Succeeded

How reproducible:

Create pod with multiple vlan interfaces connected to missing master interface.

Steps to Reproduce:

1. Create pod with multiple vlan interfaces connected to missing master interface in container
2. Make sure that pod stuck in ContainerCreating state 
3. Run oc describe pod PODNAME and read crash message:

 Normal   Scheduled               22s   default-scheduler  Successfully assigned cni-tests/pod-one to worker-0
  Normal   AddedInterface          21s   multus             Add eth0 [10.128.2.231/23] from ovn-kubernetes
  Normal   AddedInterface          21s   multus             Add ext0 [] from cni-tests/tap-one
  Normal   AddedInterface          21s   multus             Add ext0.1 [2001:100::1/64] from cni-tests/mac-vlan-one
  Warning  FailedCreatePodSandBox  18s   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_pod-one_cni-tests_2e831519-effc-4502-8ea7-749eda95bf1d_0(321d7181626b8bbfad062dd7c7cc2ef096f8547e93cb7481a18b7d3613eabffd): error adding pod cni-tests_pod-one to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [cni-tests/pod-one/2e831519-effc-4502-8ea7-749eda95bf1d:mac-vlan]: error adding container to network "mac-vlan": plugin type="macvlan" failed (add): netplugin failed: "panic: runtime error: invalid memory address or nil pointer dereference\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x54281a]\n\ngoroutine 1 [running, locked to thread]:\npanic({0x560b00, 0x6979d0})\n\t/usr/lib/golang/src/runtime/panic.go:987 +0x3ba fp=0xc0001ad8f0 sp=0xc0001ad830 pc=0x433d7a\nruntime.panicmem(...)\n\t/usr/lib/golang/src/runtime/panic.go:260\nruntime.sigpanic()\n\t/usr/lib/golang/src/runtime/signal_unix.go:835 +0x2f6 fp=0xc0001ad940 sp=0xc0001ad8f0 pc=0x449cd6\nmain.getMTUByName({0xc00001a978, 0x4}, {0xc00002004a, 0x33}, 0x1)\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:167 +0x33a fp=0xc0001ada00 sp=0xc0001ad940 pc=0x54281a\nmain.loadConf(0xc000186770, {0xc00001e009, 0x19e})\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:120 +0x155 fp=0xc0001ada80 sp=0xc0001ada00 pc=0x5422d5\nmain.cmdAdd(0xc000186770)\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:287 +0x47 fp=0xc0001adcd0 sp=0xc0001ada80 pc=0x543b07\ngithub.com/containernetworking/cni/pkg/skel.(*dispatcher).checkVersionAndCall(0xc0000bdec8, 0xc000186770, {0x5c02b8, 0xc0000e4e40}, 0x592e80)\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:166 +0x20a fp=0xc0001add60 sp=0xc0001adcd0 pc=0x5371ca\ngithub.com/containernetworking/cni/pkg/skel.(*dispatcher).pluginMain(0xc0000bdec8, 0x698320?, 0xc0000bdeb0?, 0x44ed89?, {0x5c02b8, 0xc0000e4e40}, {0xc0000000f0, 0x22})\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:219 +0x2ca fp=0xc0001ade68 sp=0xc0001add60 pc=0x53772a\ngithub.com/containernetworking/cni/pkg/skel.PluginMainWithError(...)\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:273\ngithub.com/containernetworking/cni/pkg/skel.PluginMain(0x588e01?, 0x10?, 0xc0000bdf50?, {0x5c02b8?, 0xc0000e4e40?}, {0xc0000000f0?, 0x0?})\n\t/usr/src/plugins/vendor/github.com/containernetworking/cni/pkg/skel/skel.go:288 +0xd1 fp=0xc0001adf18 sp=0xc0001ade68 pc=0x537d51\nmain.main()\n\t/usr/src/plugins/plugins/main/macvlan/macvlan.go:432 +0xb6 fp=0xc0001adf80 sp=0xc0001adf18 pc=0x544b76\nruntime.main()\n\t/usr/lib/golang/src/runtime/proc.go:250 +0x212 fp=0xc0001adfe0 sp=0xc0001adf80 pc=0x436a12\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0001adfe8 sp=0xc0001adfe0 pc=0x462fc1\n\ngoroutine 2 [force gc (idle)]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000acfb0 sp=0xc0000acf90 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.forcegchelper()\n\t/usr/lib/golang/src/runtime/proc.go:302 +0xad fp=0xc0000acfe0 sp=0xc0000acfb0 pc=0x436c6d\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000acfe8 sp=0xc0000acfe0 pc=0x462fc1\ncreated by runtime.init.6\n\t/usr/lib/golang/src/runtime/proc.go:290 +0x25\n\ngoroutine 3 [GC sweep wait]:\nruntime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000ad790 sp=0xc0000ad770 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.bgsweep(0x0?)\n\t/usr/lib/golang/src/runtime/mgcsweep.go:278 +0x8e fp=0xc0000ad7c8 sp=0xc0000ad790 pc=0x423e4e\nruntime.gcenable.func1()\n\t/usr/lib/golang/src/runtime/mgc.go:178 +0x26 fp=0xc0000ad7e0 sp=0xc0000ad7c8 pc=0x418d06\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000ad7e8 sp=0xc0000ad7e0 pc=0x462fc1\ncreated by runtime.gcenable\n\t/usr/lib/golang/src/runtime/mgc.go:178 +0x6b\n\ngoroutine 4 [GC scavenge wait]:\nruntime.gopark(0xc0000ca000?, 0x5bf2b8?, 0x1?, 0x0?, 0x0?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000adf70 sp=0xc0000adf50 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.(*scavengerState).park(0x6a0920)\n\t/usr/lib/golang/src/runtime/mgcscavenge.go:389 +0x53 fp=0xc0000adfa0 sp=0xc0000adf70 pc=0x421ef3\nruntime.bgscavenge(0x0?)\n\t/usr/lib/golang/src/runtime/mgcscavenge.go:617 +0x45 fp=0xc0000adfc8 sp=0xc0000adfa0 pc=0x4224c5\nruntime.gcenable.func2()\n\t/usr/lib/golang/src/runtime/mgc.go:179 +0x26 fp=0xc0000adfe0 sp=0xc0000adfc8 pc=0x418ca6\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000adfe8 sp=0xc0000adfe0 pc=0x462fc1\ncreated by runtime.gcenable\n\t/usr/lib/golang/src/runtime/mgc.go:179 +0xaa\n\ngoroutine 5 [finalizer wait]:\nruntime.gopark(0x0?, 0xc0000ac670?, 0xab?, 0x61?, 0xc0000ac770?)\n\t/usr/lib/golang/src/runtime/proc.go:363 +0xd6 fp=0xc0000ac628 sp=0xc0000ac608 pc=0x436dd6\nruntime.goparkunlock(...)\n\t/usr/lib/golang/src/runtime/proc.go:369\nruntime.runfinq()\n\t/usr/lib/golang/src/runtime/mfinal.go:180 +0x10f fp=0xc0000ac7e0 sp=0xc0000ac628 pc=0x417e0f\nruntime.goexit()\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1594 +0x1 fp=0xc0000ac7e8 sp=0xc0000ac7e0 pc=0x462fc1\ncreated by runtime.createfing\n\t/usr/lib/golang/src/runtime/mfinal.go:157 +0x45\n"

Actual results:

The readable error message should be provided instead.

Expected results:

We should handle such scenario without crash and The following log should be used instead. 

Error: Failed to create container due to the missing master interface XXX.

Additional info:

https://github.com/openshift/containernetworking-plugins/pull/98

Bug OCPBUGS-18072: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2895

Bug OCPBUGS-8695: NetworkManager TUI quits regardless of a detected unsupported configuration

View the Description View the linked PRs

Description of problem:

In Agent TUI, setting

IPV6 Configuration to Automatic

and enabling

Require IPV6 addressing for this connection

generates a message saying that the feature is not supported. The user is allowed to quit the TUI (formally correct given that we select 'Quit' from the menu, I wonder if the 'Quit' options should remain greyed out until a valid config is applied? ) and the boot process proceeds using an unsupported/not working network configuration

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-07-131556

How reproducible:

Steps to Reproduce:

1. Feed the agent ISO with an agent-config.yaml file that defines an ipv6 only, static network configuration

2. Boot from the generated agent ISO, wait for the agent TUI to appear, select 'Edit a connection', than change Ipv6 configuration from Manual to Automatic, contextually enable the 'Require IPV6 addressing for this connection' option. Accept the changes.

3. (Not sure if this step is necessary) Once back in the main agent TUI screen, select 'Activate a connection'.
Select the currently active connection, de-activate and re-activate it.

4. Go back to main agent TUI screen, select Quit

Actual results:

The agent TUI displays the following message than quits

Failed to generate network state view: support for multiple default routes not yet implemented in agent-tui

Once the TUI quits, the boot process proceeds

Expected results:

The TUI blocks the possibility to enable unsupported configurations

The agent TUI informs the user about the unsupported configuration the moment it is applied (instead of informing the user the moment he selects 'Quit') and stays opened until a valid network configuration is applied

The TUI should put the boot process on hold until a valid network config is applied

Additional info:

OCP Version: 4.13.0-0.nightly-2023-03-07-131556 

agent-config.yaml snippet

  networkConfig:
    interfaces:
      - name: eno1
        type: ethernet
        state: up
        mac-address: 34:73:5A:9E:59:10
        ipv6:
          enabled: true
          address:
            - ip: 2620:52:0:1eb:3673:5aff:fe9e:5910
              prefix-length: 64
          dhcp: false

https://github.com/openshift/assisted-installer-agent/pull/517

Bug OCPBUGS-9970: Cluster CAPI operator webhook for Cluster object panics with empty spec

View the Description View the linked PRs

Description of problem:

InfraStructureRef* is dereferenced without checking for nil value

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Run TechPreview cluster
2. Try to create Cluster object with empty spec
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: example
  namespace: openshift-cluster-api
spec: {}
 3. Observe panic in cluster-capi-operator

Actual results:

2023/03/10 14:13:31 http: panic serving 10.129.0.2:39614: runtime error: invalid memory address or nil pointer dereference
goroutine 3619 [running]:
net/http.(*conn).serve.func1()
    /usr/lib/golang/src/net/http/server.go:1850 +0xbf
panic({0x16cada0, 0x2948bc0})
    /usr/lib/golang/src/runtime/panic.go:890 +0x262
github.com/openshift/cluster-capi-operator/pkg/webhook.(*ClusterWebhook).ValidateCreate(0xc000ceac00?, {0x24?, 0xc00090fff0?}, {0x1b72d68?, 0xc0010831e0?})
    /go/src/github.com/openshift/cluster-capi-operator/pkg/webhook/cluster.go:32 +0x39
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*validatorForType).Handle(_, {_, _}, {{{0xc000ceac00, 0x24}, {{0xc00090fff0, 0x10}, {0xc000838000, 0x7}, {0xc000838007, ...}}, ...}})
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/validator_custom.go:79 +0x2dd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).Handle(_, {_, _}, {{{0xc000ceac00, 0x24}, {{0xc00090fff0, 0x10}, {0xc000838000, 0x7}, {0xc000838007, ...}}, ...}})
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/webhook.go:169 +0xfd
sigs.k8s.io/controller-runtime/pkg/webhook/admission.(*Webhook).ServeHTTP(0xc000630e80, {0x7f26f94b5580?, 0xc000f80280}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/admission/http.go:98 +0xeb5
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerInFlight.func1({0x7f26f94b5580, 0xc000f80280}, 0x1b7ff00?)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:60 +0xd4
net/http.HandlerFunc.ServeHTTP(0x1b7ffb0?, {0x7f26f94b5580?, 0xc000f80280?}, 0x7afe60?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1({0x1b7ffb0?, 0xc000a72000?}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:146 +0xb8
net/http.HandlerFunc.ServeHTTP(0x0?, {0x1b7ffb0?, 0xc000a72000?}, 0xc00056f0e1?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
github.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func2({0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /go/src/github.com/openshift/cluster-capi-operator/vendor/github.com/prometheus/client_golang/prometheus/promhttp/instrument_server.go:108 +0xbf
net/http.HandlerFunc.ServeHTTP(0xc000a72000?, {0x1b7ffb0?, 0xc000a72000?}, 0x18e45d1?)
    /usr/lib/golang/src/net/http/server.go:2109 +0x2f
net/http.(*ServeMux).ServeHTTP(0xc00056f0c0?, {0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /usr/lib/golang/src/net/http/server.go:2487 +0x149
net/http.serverHandler.ServeHTTP({0x1b71dc8?}, {0x1b7ffb0, 0xc000a72000}, 0xc000750800)
    /usr/lib/golang/src/net/http/server.go:2947 +0x30c
net/http.(*conn).serve(0xc00039af00, {0x1b81198, 0xc000416c00})
    /usr/lib/golang/src/net/http/server.go:1991 +0x607
created by net/http.(*Server).Serve
    /usr/lib/golang/src/net/http/server.go:3102 +0x4db

Expected results:

Webhook returns error, but does not panic

Additional info:

https://github.com/openshift/cluster-capi-operator/pull/116

Task HOSTEDCP-1067: Dependency management with dependabot

View the Description View the linked PRs

Adding dependabot to manage to the go module dependencies of the HyperShift repository

https://github.com/openshift/hypershift/pull/2708

Task HOSTEDCP-981: Make minor updates to Getting Started & Contribute pages

View the Description View the linked PRs

As a developer, I would like the Getting Started page to use numbered list so that it is easier to point people to specific sections of the document.

As a developer, I would like the Contribute page to be a numbered list so that it is easier to point people to specific line items of the document.

https://github.com/openshift/hypershift/pull/2527

Bug MGMT-14015: Custom manifest feature usage is never turning off

View the Description View the linked PRs

Description of the problem:

We are turning on the feature-usage flag for custom manifests whenever we are crating a new custom cluster manifest. When we delete that manifest the flag is stays on.

Expected results:

Need to turn off the flag when deleting the custom manifest

https://github.com/openshift/assisted-service/pull/5363

Bug OCPBUGS-10171: Update 4.14 ose-cluster-machine-approver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-machine-approver/pull/180

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-machine-approver/pull/194

Bug OCPBUGS-16690: [azure] uninitialized taint couldn't be removed if user defined taints in machineset

View the Description View the linked PRs

Description of problem:

User defined taints in machineset, then scale up machineset, instance can join the cluster and Node will be Ready but pod couldn't be deployed, checked node yaml file uninitialized taint was not removed.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-20-215234

How reproducible:

Always

Steps to Reproduce:

1.Setup a cluster on Azure
2.Create a machineset with taint
      taints:
      - effect: NoSchedule
        key: mapi
        value: mapi_test
3.Check node yaml file

Actual results:

uninitialized taint still in node, but no providerID in node.
$ oc get node 
NAME                                              STATUS   ROLES                  AGE   VERSION
zhsun724-mh4dt-master-0                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-master-1                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-master-2                           Ready    control-plane,master   9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-8rzqw              Ready    worker                 21m   v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-additional-q58zp   Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-additional-vwwhh   Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus21-v7k7s              Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus22-ggxql              Ready    worker                 9h    v1.27.3+4aaeaec
zhsun724-mh4dt-worker-westus23-zf8l5              Ready    worker                 9h    v1.27.3+4aaeaec

$ oc edit node zhsun724-mh4dt-worker-westus21-8rzqw
spec:
  taints:
  - effect: NoSchedule
    key: node.cloudprovider.kubernetes.io/uninitialized
    value: "true"
  - effect: NoSchedule
    key: mapi
    value: mapi_test

Expected results:

uninitialized taint is removed, providerID is set in node.

Additional info:

must-gather: https://drive.google.com/file/d/12ypYmHN98j9lyWCS9Dgaqq5MLpftqEkS/view?usp=sharing

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/267

Bug OCPBUGS-20429: Revert https://issues.redhat.com//browse/NETOBSERV-987

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20391~~. The following is the description of the original issue:
—
Revert https://github.com/openshift/must-gather/pull/357 as it's part of a dedicated https://github.com/netobserv/must-gather image

https://github.com/openshift/must-gather/pull/391

Bug OCPBUGS-10845: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7025

Bug OCPBUGS-20526: [HCP] PSA labels on namespaces in HyperShift guest cluster enforce "restricted" while OCP of same version is good without such issue

View the Description View the linked PRs

Description of problem:

[Hypershift] KAS labels on projects created should be consistent with OCP
enforce: privileged

Version: 4.14.0-0.nightly-2023-10-10-084534

How reproducible: Always

Steps to Reproduce:

1. Install OCP cluster and hypershift operator
2. Create hosted cluster
3. Create a test project on hosted cluster

Actual results:

The hosted cluster KAS labels on the test project is 'enforce: restricted'
$ oc new-project test1 --kubeconfig=guest.kubeconfig 
$ oc get ns test1 -oyaml --kubeconfig=guest.kubeconfig 
...
  labels:
    kubernetes.io/metadata.name: test1
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.24
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.24
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.24
  name: test1
...

Expected results:

The hosted cluster KAS labels on projects should be "enforce: privileged" as Managent cluster KAS labels on projects created is "enforce: privileged"

Management cluster:

$ oc new-project test
$ oc get ns test -oyaml
...
  labels:
    kubernetes.io/metadata.name: test
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/audit-version: v1.24
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/warn-version: v1.24
  name: test
...

https://github.com/openshift/hypershift/pull/3111

Bug OCPBUGS-23498: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-22758~~.

https://github.com/openshift/installer/pull/7919

Bug OCPBUGS-7484: When there are 2 pipelines displayed in the dropdown menu, selecting one, unchecks the Add Pipeline checkbox

View the Description View the linked PRs

Description of problem:

The issue is regarding the Add Pipeline Checkbox. When there are 2 pipelines displayed in the dropdown menu, selecting one, unchecks the Add Pipeline checkbox.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always when 2 pipelines are in the ns

Steps to Reproduce:

1. Go to the Git Import Page. Create the application with Add Pipelines checked and a pipeline selected.
2. Go to the Serverless Function Page. Select Add Pipelines checkbox and try to select a pipeline from the drop-down.

Actual results:

The Add Pipelines checkbox automatically gets unchecked on selecting a Pipeline from the drop-down (in case of multiple pipelines in the dropdown)

Expected results:

The Add Pipelines checkbox must not get un-checked.

Additional info:

Video Link: https://drive.google.com/file/d/1OPRXbMw-EiihO3LAlDiOsh8qvhhiJK5H/view?usp=sharing

https://github.com/openshift/console/pull/12650

Bug OCPBUGS-21099: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/multus-cni/pull/194

Story HOSTEDCP-802: Add Ability to Set NodePool UpgradeType Thru HyperShift CLI

View the Description View the linked PRs

As a user of the HyperShift CLI, I would like to be able to set the NodePool UpgradeType through a flag when either creating a new cluster or creating a new NodePool.

DoD:

A flag has been added to the create new cluster command allowing the NodePool UpgradeType to be set to either Replace or InPlace
A flag has been added to the create new NodePool command allowing the NodePool UpgradeType to be set to either Replace or InPlace
If either flag is not set, the default will be Replace as that is the current default

https://github.com/openshift/hypershift/pull/2367

Bug OCPBUGS-10187: Update 4.14 ose-apiserver-network-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/apiserver-network-proxy/pull/30

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/apiserver-network-proxy/pull/30

Bug OCPBUGS-10230: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12550

Bug OCPBUGS-13387: Not able to import the repository with .tekton directory and func.yaml file present

View the Description View the linked PRs

Description of problem:

Not able to import the repository with .tekton directory and func.yaml file present. As getting this error `Cannot read properties of undefined (reading 'filter')`

Version-Release number of selected component (if applicable):

4.13, Pipeline and Serverless is installed

How reproducible:

Steps to Reproduce:

1. In import from git form enter the git URL: https://github.com/Lucifergene/oc-pipe-func
2. Pipeline is checked and PAC option is selected by default even if user uncheck the Pipeline option user get the same error
3. click Create button

Actual results:

Not able to import and getting this error `Cannot read properties of undefined (reading 'filter')`

Expected results:

should able to import without any error

Additional info:

https://github.com/openshift/console/pull/13046

Bug OCPBUGS-18604: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13134

Bug OCPBUGS-12769: CPMSO tests: out of memory errors on linting job

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/203

Bug OCPBUGS-17073: [4.14][Azure][MAG] install MAG cluster failed by ‘Error ensuring Resource Providers are registered’

View the Description View the linked PRs

Description of problem:

Azure MAG install failed by Terraform error ‘Error ensuring Resource Providers are registered’

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-27-172239

How reproducible:

Always

Steps to Reproduce:

1. Create MAG Azure cluster with IPI

Actual results:

Fail to create the installer when ‘Creating infrastructure resources…’

In terraform.log: 
2023-07-29T11:33:02.938Z [ERROR] provider.terraform-provider-azurerm: Response contains error diagnostic: @module=sdk.proto tf_proto_version=5.3 tf_provider_addr=provider tf_req_id=45c10824-360b-b211-1ba1-9c3a722014af @caller=/go/src/github.com/openshift/installer/terraform/providers/azurerm/vendor/github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/diag/diagnostics.go:55 diagnostic_detail= diagnostic_severity=ERROR diagnostic_summary="Error ensuring Resource Providers are registered.Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.Please note that if you opt out of Resource Provider Registration and Terraform tries
to provision a resource from a Resource Provider which is unregistered, then the errors
may appear misleading - for example:> API version 2019-XX-XX was not found for Microsoft.FooCould indicate either that the Resource Provider "Microsoft.Foo" requires registration,
but this could also indicate that this Azure Region doesn't support this API version.More information on the "skip_provider_registration" flag can be found here:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#skip_provider_registrationOriginal Error: determining which Required Resource Providers require registration: the required Resource Provider "Microsoft.CustomProviders" wasn't returned from the Azure API" tf_rpc=Configure timestamp=2023-07-29T11:33:02.937Z
2023-07-29T11:33:02.938Z [ERROR] vertex "provider[\"openshift/local/azurerm\"]" error: Error ensuring Resource Providers are registered.Terraform automatically attempts to register the Resource Providers it supports to
ensure it's able to provision resources.If you don't have permission to register Resource Providers you may wish to use the
"skip_provider_registration" flag in the Provider block to disable this functionality.Please note that if you opt out of Resource Provider Registration and Terraform tries
to provision a resource from a Resource Provider which is unregistered, then the errors
may appear misleading - for example:> API version 2019-XX-XX was not found for Microsoft.FooCould indicate either that the Resource Provider "Microsoft.Foo" requires registration,
but this could also indicate that this Azure Region doesn't support this API version.More information on the "skip_provider_registration" flag can be found here:
https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs#skip_provider_registrationOriginal Error: determining which Required Resource Providers require registration: the required Resource Provider "Microsoft.CustomProviders" wasn't returned from the Azure API

Expected results:

Create the installer should succeed.

Additional info:

Suspect that issue with https://github.com/openshift/installer/pull/7205/, IPI install on Azure MAG with 4.14.0-0.nightly-2023-07-27-051258 is OK

https://github.com/openshift/installer/pull/7412

Bug OCPBUGS-16526: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openstack-cinder-csi-driver-operator/pull/122

Bug OCPBUGS-10189: Update 4.14 ose-cluster-ingress-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-ingress-operator/pull/898

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-ingress-operator/pull/898

Bug OCPBUGS-12325: Update 4.14 ose-ibm-vpc-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-vpc-block-csi-driver/pull/36

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-vpc-block-csi-driver/pull/37

Bug OCPBUGS-15367: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3808

Bug OCPBUGS-25275: [release 4.14] Azure Workload Identity Management for layered products (OLM operators)

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/643

Bug OCPBUGS-26214: [4.14] cluster-backup failed in upgrade from 4.14.7 to 4.15.0-rc.0

View the Description View the linked PRs

Description of problem:
As title, upgrade failed and msg show cluster-backup fails

$oc get pods -n openshift-etcd
NAME READY STATUS RESTARTS AGE
cluster-backup 0/1 Error 0 10s

$oc logs -f cluster-backup
Defaulted container "cluster-backup" out of: cluster-backup, verify-storage (init)
+ '[' -n /etc/kubernetes/cluster-backup ']'
+ echo 'removing all backups in /etc/kubernetes/cluster-backup'
+ rm -rf /etc/kubernetes/cluster-backup
removing all backups in /etc/kubernetes/cluster-backup
rm: cannot remove '/etc/kubernetes/cluster-backup': Device or resource busy

How reproducible:
Always

Steps to Reproduce:
1. upgrade from 4.14.7 to 4.15.0-rc.0
2. upgrade failed:

message: |-
Preconditions failed for payload loaded version="4.15.0-rc.0" image="quay.io/openshift-release-dev/ocp-release@sha256:a3b96bf3aef71d4062cae3961fcba3bb3019b52ab2da60b2997d4842f9cdcf07": Multiple precondition checks failed:
* Precondition "ClusterVersionUpgradeable" failed because of "PoolUpdating": Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details
* Precondition "EtcdRecentBackup" failed because of "UpgradeBackupFailed": RecentBackup: pod failed within retry duration: delete skipped
reason: PreconditionChecks
status: "False"
type: ReleaseAccepted

Actual results:
As title
Expected results
Upgrade succ

https://github.com/openshift/cluster-etcd-operator/pull/1176

Bug OCPBUGS-4501: IPv6 interface and address missing in all pods - OCP 4.12-ec-2 BM IPI

View the Description View the linked PRs

Description of problem:

IPV6 interface and IP is missing in all pods created in OCP 4.12 EC-2.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Every time

Steps to Reproduce:

We create network-attachment-definitions.k8s.cni.cncf.io in OCP cluster at namespace scope for our software pods to get IPV6 IPs.

Actual results:

Pods do not receive IPv6 addresses

Expected results:

Pods receive IPv6 addresses

Additional info:

This has been working flawlessly till OCP 4.10. 21 however we are trying same code in OCP 4.12-ec2 and we notice all our pods are missing ipv6 address and we have to restart pods couple times for them to get ipv6 address.

https://github.com/openshift/ironic-static-ip-manager/pull/35

Bug OCPBUGS-5453: [Openshift Pipelines] Metrics page is broken

View the Description View the linked PRs

Description of problem:

Metrics page is broken

Version-Release number of selected component (if applicable):

Openshift Pipelines 1.9.0 on 4.12

How reproducible:

Always

Steps to Reproduce:

1. Install Openshift Pipelines 1.9.0
2. Create a pipeline and run it several times
3. Update metrics.pipelinerun.duration-type and metrics.taskrun.duration-type to lastvalue
4. Navigate to created pipeline 
5. Switch to Metrics tab

Actual results:

The Metrics page is showing error

Expected results:

Metrics of the pipeline should be shown

Additional info:

https://github.com/openshift/console/pull/12435

Bug OCPBUGS-21520: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-vsphere/pull/54

Bug OCPBUGS-22789: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2090

Bug OCPBUGS-8483: Origin tests should not specify `readyz` as the health check path

View the Description View the linked PRs

Description of problem:

We merged a change into origin to modify a test so that `/readyz` would be used as the health check path. It turns out this makes things worse because we want to use kube-proxy's health probe endpoint to monitor the node health, and kube-proxy only exposes `/healthz` which is the default path anyway.

We should remove the annotation added to change the path and go back to the defaults.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27771

Bug MGMT-14734: [Staging] [Nutanix] - Error "Failed to update the cluster" when trying to turn platform on Nutanix with OCP multi version

View the Description View the linked PRs

Description of the problem:

In Staging, BE 2.20.1 - trying to set "Integrate with platform" switch on, getting:

Failed to update the cluster

only x86-64 CPU architecture is supported on Nutanix clusters

How reproducible:

100%

Steps to reproduce:

1. Create new cluster with OCP multi version

2. Discover NTNX hosts and turn integrate with platform on

3.

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5253

Bug OCPBUGS-12716: Update 4.14 ose-powervs-block-csi-driver image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/ibm-powervs-block-csi-driver/pull/28

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/ibm-powervs-block-csi-driver/pull/30

Bug OCPBUGS-14969: HCP Service Loadbalancer uses default SecurityGroup

View the Description View the linked PRs

Description of problem:

When an HCP Service LB is created, for example for an IngressController, the CAPA controller calls ModifyNetworkInterfaceAttribute. It references the default security group for the VPC in addition to the security group created for the cluster ( with the right tags). Ideally, the LBs (and any other HCP components) should not be using the default VPC SecurityGroup

Version-Release number of selected component (if applicable):

All 4.12 and 4.13

How reproducible:

100%

Steps to Reproduce:

1. Create HCP
2. Wait for Ingress to come up.
3. Look in CloudTrail for ModifyNetworkInterfaceAttribute, and see default security group referenced

Actual results:

Default security group is used

Expected results:

Default security group should not be used

Additional info:

This is problematic as we are attempting to scope our AWS permissions as small as possible. The goal is to only use resources that are tagged with `red-hat-managed: true` so that our IAM Policies can conditioned to only access these resources. Using the Security Group created for the cluster should be sufficient, and the default Security Group does not need to be used, so if the usage can be removed here, we can secure our AWS policies that much better. Similar to OCPBUGS-11894

https://github.com/openshift/cluster-api-provider-aws/pull/467

Bug OCPBUGS-18056: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/237

Bug OCPBUGS-19878: TaskRun duration chart legend shows only 4 taskruns

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-7893~~. The following is the description of the original issue:
—
Description of problem:
The TaskRun duration diagram on the "Metrics" tab of pipeline is set to only show 4 TaskRuns in the legend regardless of the number of TaskRuns on the diagram.

Expected results:

All TaskRuns should be displayed in the legend.

https://github.com/openshift/console/pull/13202

Bug OCPBUGS-8446: Pausing pools in OCP 4.13 will cause critical alerts to fire

View the Description View the linked PRs

Description of problem:

The certificates synced by MCO in 4.13 onwards are more comprehensive and correct, and out of sync issues will surface much faster.

See https://issues.redhat.com/browse/MCO-499 for details

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1.Install 4.13, pause MCPs
2.
3.

Actual results:

Within ~24 hours the cluster will fire critical clusterdown alerts

Expected results:

No alerts fire

Additional info:

https://github.com/openshift/machine-config-operator/pull/3575

Task SPLAT-1099: [vsphere] introduce feature gate for vSphere static IPs

View the linked PRs

https://github.com/openshift/cluster-config-operator/pull/323

Bug OCPBUGS-18285: Bump to kubernetes 1.27.6

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.6:

Changelog:
v1.27.6: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1275
v1.27.5: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1274

https://github.com/openshift/kubernetes/pull/1709

Bug OCPBUGS-19953: [AWS SC2S] ec2:DescribeSecurityGroupRules is not supported in SC2S region.

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18830~~. The following is the description of the original issue:
—
Description of problem:

Failed to install cluster on SC2S region as:

level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-09-11-201102

How reproducible:

Always

Steps to Reproduce:

1. Create an OCP cluster on SC2S

Actual results:

Install fail:
level=error msg=Error: reading Security Group (sg-0b0cd054dd599602f) Rules: UnsupportedOperation: The functionality you requested is not available in this region.

Expected results:

Install succeed.

Additional info:

* C2S region is not affected

https://github.com/openshift/installer/pull/7543

Bug OCPBUGS-24320: [4.14] Can we view status of an adminbased external route policy, if so then how/where?

View the Description View the linked PRs

Description of problem:

On the prerelease doc Configure a secondary external gateway, on stop 3. we state the output of said command should confirm the admin policy has been created:

#oc describe apbexternalroute <name> | tail -n 6

First of all this is a typo there is no "apbexternalroute", the correct term is "adminpolicybasedexternalroutes", even if we use the correct term, the resulting output is almost not relevant as per the status of said policy, it just reports on the policy it's self and well some minor details like time and so on.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-04-143709

How reproducible:

Every time

Steps to Reproduce:

1. Deploy a cluster
2. Boot up a pod under a namespace
3. $ cat 4.create.abp_static_bar1.yaml  later apply said policy
apiVersion: k8s.ovn.org/v1
kind: AdminPolicyBasedExternalRoute
metadata:
  name: first-policy
spec:
## gateway example
  from:
    namespaceSelector:
      matchLabels:
          kubernetes.io/metadata.name: bar
  nextHops:       
    static:
      - ip: "173.20.0.8"
      - ip: "173.20.0.9"
4. confirm policy in place: $ oc getadminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy   

5. But wow do we test the policies status? 
The doc's guide doesn't help much:  $ oc describeadminpolicybasedexternalroutes.k8s.ovn.org <name> | tail -n 6 

$ oc describe adminpolicybasedexternalroutes.k8s.ovn.org first-policy 
Name:         first-policy
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  k8s.ovn.org/v1
Kind:         AdminPolicyBasedExternalRoute
Metadata:
  Creation Timestamp:  2023-10-30T20:09:20Z
  Generation:          1
  Resource Version:    10904672
  UID:                 3c4a60da-a618-45b1-94a8-2085dcdc5631
Spec:
  From:
    Namespace Selector:
      Match Labels:
        kubernetes.io/metadata.name:  bar
  Next Hops:
    Static:
      Bfd Enabled:  false
      Ip:           173.20.0.8
      Bfd Enabled:  false
      Ip:           173.20.0.9
Events:             <none>
 

Noting regarding policy status shows up, if this is even supported at all, other than fixing the doc, if there is a way to view the status it should be documented. One more thing if there is indeed a policy status shouldn't it also populate the status column here:

$ oc get adminpolicybasedexternalroutes.k8s.ovn.org 
NAME           LAST UPDATE   STATUS
first-policy                   ^ 

Asking as on another bug https://issues.redhat.com/browse/OCPBUGS-22706, I recreated a situation where the status should have reported an error yet it never did nor does it update the above table, come to think of it the last update column too has never exposed any data either, in which case why do we even have these two columns to begin with?

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-network-operator/pull/2143

Bug OCPBUGS-25685: pinned packages in ironic-agent-image breaks ART pipeline

View the Description View the linked PRs

because of the pin in the packages list the ART pipeline is rebuilding packages all the time
unfortunately we need to remove the strong pins and move back to relaxed ones

once that's done we need to merge https://github.com/openshift-eng/ocp-build-data/pull/4097

https://github.com/openshift/ironic-agent-image/pull/103

Story AGENT-275: Installer command to output the internal agent dependency graph

View the Description View the linked PRs

The installer offers a graph command to output its internal dependency graph. It could be useful to have a similar command, ie agent graph to output the specific agent dependency graph

https://github.com/openshift/installer/pull/7066

Bug MGMT-14226: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5153

Bug OCPBUGS-12289: Update 4.14 golang-github-prometheus-alertmanager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-alertmanager/pull/70

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-alertmanager/pull/70

Bug OCPBUGS-12554: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-aws/pull/69

Bug OCPBUGS-14254: Restore script improvements

View the Description View the linked PRs

adding two minor flags for improvement in our CI tests:

https://github.com/openshift/cluster-etcd-operator/pull/1057

https://github.com/openshift/cluster-etcd-operator/pull/1057

Bug OCPBUGS-22636: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/thanos/pull/130

Bug OCPBUGS-9409: Import from git does not work with local BitBucket

View the Description View the linked PRs

==> Description of problem:

"Import from git" functionality with a local Bitbucket instance does not work, due to repository validation that requires to repository to be hosted on Bitbucket Cloud. [1][2]

[1] https://github.com/openshift/console/blob/release-4.10/frontend/packages/git-service/src/services/bitbucket-service.ts#L63

[2] https://github.com/openshift/console/blob/release-4.10/frontend/packages/git-service/src/services/bitbucket-service.ts#L18

==> Version-Release number of selected component (if applicable):

Tested in OCP 4.10

==> How reproducible: 100%

==> Steps to Reproduce:
1. Go to: Developer View > Add+ > From Git
2. Fill the "Git Repo URL" field with the BitBucket repo URL (i.e. http://<bitbucket_url>/scm/<project>/<repository>.git)
3. Select BitBucket from the "Git type" dropdowns button

==> Actual results:
"URL is valid but cannot be reached. If this is a private repository, enter a source Secret in advanced Git options"

==> Expected results:

This functionality should work also with hosted Bitbucket

==> Additional info:

To retrieve slug information from hosted BitBucket we can query: http://<bitbucket_url>/rest/api/1.0/projects/<project>/repos/<repository>

An example:

~~~
curl -ks http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/rest/api/1.0/projects/test/repos/test-repo | jq
{
"slug": "test-repo",
"id": 1,
"name": "test-repo",
"hierarchyId": "28fc5c8782050b43e223",
"scmId": "git",
"state": "AVAILABLE",
"statusMessage": "Available",
"forkable": true,
"project": {
"key": "TEST",
"id": 1,
"name": "test",
"public": false,
"type": "NORMAL",
"links": {
"self": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/projects/TEST" }

]
}
},
"public": true,
"archived": false,
"links": {
"clone": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/scm/test/test-repo.git", "name": "http" }

,

{ "href": "ssh://git@bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster:7999/test/test-repo.git", "name": "ssh" }

],
"self": [

{ "href": "http://bitbucket-server-bitbucket.apps.gmeghnag.lab.cluster/projects/TEST/repos/test-repo/browse" }

]
}
}
~~~

https://github.com/openshift/console/pull/13021

Bug OCPBUGS-15918: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/1903

Bug OCPBUGS-19703: Internal Registry Secrets merge causing excessive API calls

View the Description View the linked PRs

This is a manual clone of https://issues.redhat.com/browse/OCPBUGS-18902 for backporting purposes.

In this recent PR that merged, a number of API calls do not use caches causing excessive calls.

Done when:

-Change all Get() calls to use listers

-API call metric should decrease

https://github.com/openshift/machine-config-operator/pull/3941

Bug OCPBUGS-12813: Update 4.14 ose-cluster-openshift-apiserver-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-openshift-apiserver-operator/pull/531

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

Bug OCPBUGS-21162: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/node_exporter/pull/134

Bug OCPBUGS-22917: jsonnet dependencies are unpinned in release version

View the Description View the linked PRs

Release branches should not point to main/master branches.

https://github.com/openshift/cluster-monitoring-operator/pull/2143

Bug OCPBUGS-8220: CSI Inline Volume admission plugin does not log object name correctly

View the Description View the linked PRs

Description of problem:

[CSI Inline Volume admission plugin] when using deployment/statefulset/daemonset workload with inline volume doesn't record audit logs/warning correctly

Version-Release number of selected component (if applicable):

4.13.0-0.ci.test-2023-03-02-013814-ci-ln-yd4m4st-latest (nightly build also could be reproduced)

How reproducible:

Always

Steps to Reproduce:

1. Enable feature gate to auto install the csi.sharedresource csi driver

2. Add security.openshift.io/csi-ephemeral-volume-profile: privileged to CSIDriver 'csi.sharedresource.openshift.io' # scale down the cvo,cso and shared-resource-csi-driver-operator $ oc scale --replicas=0 deploy/cluster-version-operator -n openshift-cluster-version deployment.apps/cluster-version-operator scaled $oc scale --replicas=0 deploy/cluster-storage-operator -n openshift-cluster-storage-operator deployment.apps/cluster-storage-operator scaled $ oc scale --replicas=0 deploy/shared-resource-csi-driver-operator -n openshift-cluster-csi-drivers deployment.apps/shared-resource-csi-driver-operator scaled # Add security.openshift.io/csi-ephemeral-volume-profile: privileged to CSIDriver $ oc get csidriver/csi.sharedresource.openshift.io -o yaml apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: annotations: csi.openshift.io/managed: "true" operator.openshift.io/spec-hash: 4fc61ff54015a7e91e07b93ac8e64f46983a59b4b296344948f72187e3318b33 creationTimestamp: "2022-10-26T08:10:23Z" labels: security.openshift.io/csi-ephemeral-volume-profile: privileged

3. Create different workloads with inline volume in a restricted namespace
$ oc apply -f examples/simple 
role.rbac.authorization.k8s.io/shared-resource-my-share-pod created 
rolebinding.rbac.authorization.k8s.io/shared-resource-my-share-pod created configmap/my-config created sharedconfigmap.sharedresource.openshift.io/my-share-pod created 
Error from server (Forbidden): error when creating "examples/simple/03-pod.yaml": pods "my-csi-app-pod" is forbidden: admission denied: pod my-csi-app-pod uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged 
Error from server (Forbidden): error when creating "examples/simple/04-deployment.yaml": deployments.apps "mydeployment" is forbidden: admission denied: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged 
Error from server (Forbidden): error when creating "examples/simple/05-statefulset.yaml": statefulsets.apps "my-sts" is forbidden: admission denied: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security enforce level that is lower than privileged

4.  Add enforce: privileged label to the test ns and create different workloads with inline volume again 
$ oc label ns/my-csi-app-namespace security.openshift.io/scc.podSecurityLabelSync=false pod-security.kubernetes.io/enforce=privileged pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/warn=restricted --overwrite
namespace/my-csi-app-namespace labeled

$ oc apply -f examples/simple                    
role.rbac.authorization.k8s.io/shared-resource-my-share-pod created
rolebinding.rbac.authorization.k8s.io/shared-resource-my-share-pod created
configmap/my-config created
sharedconfigmap.sharedresource.openshift.io/my-share-pod created
Warning: pod my-csi-app-pod uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security warn level that is lower than privileged
pod/my-csi-app-pod created
Warning: pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security warn level that is lower than privileged
deployment.apps/mydeployment created
daemonset.apps/my-ds created
statefulset.apps/my-sts created

$ oc get po                                               
NAME                            READY   STATUS    RESTARTS   AGE
my-csi-app-pod                  1/1     Running   0          34s
my-ds-cw4k7                     1/1     Running   0          32s
my-ds-sv9vp                     1/1     Running   0          32s
my-ds-v7f9m                     1/1     Running   0          32s
my-sts-0                        1/1     Running   0          31s
mydeployment-664cd95cb4-4s2cd   1/1     Running   0          33s

5. Check the api-server audit logs
$ oc adm node-logs ip-10-0-211-240.us-east-2.compute.internal --path=kube-apiserver/audit.log | grep 'uses an inline volume provided by'| tail -1 | jq . | grep 'CSIInlineVolumeSecurity'
    "storage.openshift.io/CSIInlineVolumeSecurity": "pod  uses an inline volume provided by CSIDriver csi.sharedresource.openshift.io and namespace my-csi-app-namespace has a pod security audit level that is lower than privileged"

Actual results:

In step 3 and step 4: deployment workloads the warning info pod name is empty
statefulset/daemonset workloads the warning info doesn't display
In step 5: audit logs the pod name is empty

Expected results:

In step 3 and step 4: deployment workloads the warning info pod name should be exist
statefulset/daemonset workloads the warning info should display
In step 5: audit logs the pod name shouldn't be empty it should record the workload type and pod specific names

Additional info:

Testdata:
https://github.com/Phaow/csi-driver-shared-resource/tree/test-inlinevolume/examples/simple

https://github.com/openshift/kubernetes/pull/1499

Story OPECO-2737: [DOWNSTREAM] rename `veneer` to `template` for all code, docs references

View the Description View the linked PRs

This is the downstreaming issue for the upstream operator-registry changes. Upstream olm-docs repo will be downstreamed as part of later docs updates.
https://docs.google.com/document/d/139yXeOqAJbV1ndC7Q4NbaOtzbSdNpcuJan0iemORd3g/

-------------------------------------------

Veneer is viewed as a confusing and counter-intuitive term. PM floated `catalog template` (`template` for short) as a replacement and it's resonated sufficiently with folks that we want to update references/commands to use the new term.

A/C:

updates to all upstream docs (olm.operatorframework.io)
updates to hackmd references (hierarchy head at https://hackmd.io/O-DelGCnRbSmioFYnuBqkA)
updates to operator-registry commands (strongly prefer to also make changes to code paths, module names, etc. to make the change consistently)
updates to the generated demo for semver (or deletion.... really, the thing here is to be consistent)
Docs audit (collaboration with docs Michael Peter and Alex Dellapenta )
creation of a new downstreaming story to populate the changes to master, 4.12 so that early adopters aren't ambushed by what is merely a name change.

https://github.com/openshift/operator-framework-olm/pull/461

Bug OCPBUGS-15568: Cluster resource quota should control resource limits across namespaces showing regression in 4.14

View the Description View the linked PRs

Description of problem:
Component Readiness is showing a regression in 4.14 compared to 4.13 in the rt variant of test Cluster resource quota should control resource limits across namespaces. Example

{  fail [github.com/openshift/origin/test/extended/quota/clusterquota.go:107]: unexpected error: timed out waiting for the condition
Ginkgo exit error 1: exit with code 1}

Looker studio graph (scroll down to see) shows the regression started around May 24th.

Version-Release number of selected component (if applicable):

How reproducible:
4.13 Sippy shows 100% success rate vs. 4.14 which is down to about 91%

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Historical pass rate was 100%

Additional info:

Bug OCPBUGS-18978: router pod has mgmt KAS access even though it doesn't have NeedManagementKASAccessLabel

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18907~~. The following is the description of the original issue:
—
Description of problem:

From on to https://issues.redhat.com/browse/OCPBUGS-17827

jiezhao-mac:hypershift jiezhao$ oc get hostedcluster -n clusters
NAME       VERSION                              KUBECONFIG                  PROGRESS    AVAILABLE   PROGRESSING   MESSAGE
jie-test   4.14.0-0.nightly-2023-09-12-024050   jie-test-admin-kubeconfig   Completed   True        False         The hosted control plane is available
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods -n clusters-jie-test | grep router
router-78d47f4c69-2mvbp                               1/1     Running            0          68m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get pods router-78d47f4c69-2mvbp -n clusters-jie-test -ojsonpath='{.metadata.labels}' | jq
{
  "app": "private-router",
  "hypershift.openshift.io/hosted-control-plane": "clusters-jie-test",
  "hypershift.openshift.io/request-serving-component": "true",
  "pod-template-hash": "78d47f4c69"
}
jiezhao-mac:hypershift jiezhao$ oc get networkpolicy management-kas  -n clusters-jie-test
NAME             POD-SELECTOR                                                                                   AGE
management-kas   !hypershift.openshift.io/need-management-kas-access,name notin (aws-ebs-csi-driver-operator)   76m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get networkpolicy management-kas  -n clusters-jie-test -o yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  annotations:
    hypershift.openshift.io/cluster: clusters/jie-test
  creationTimestamp: "2023-09-12T14:43:13Z"
  generation: 1
  name: management-kas
  namespace: clusters-jie-test
  resourceVersion: "54049"
  uid: 72288fed-a1f6-4dc9-bb63-981d7cdd479f
spec:
  egress:
  - to:
    - podSelector: {}
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 10.0.46.47/32
        - 10.0.7.159/32
        - 10.0.77.20/32
        - 10.128.0.0/14
  - ports:
    - port: 5353
      protocol: UDP
    - port: 5353
      protocol: TCP
    to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-dns
  podSelector:
    matchExpressions:
    - key: hypershift.openshift.io/need-management-kas-access
      operator: DoesNotExist
    - key: name
      operator: NotIn
      values:
      - aws-ebs-csi-driver-operator
  policyTypes:
  - Egress
status: {}
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes
NAME         ENDPOINTS                                         AGE
kubernetes   10.0.46.47:6443,10.0.7.159:6443,10.0.77.20:6443   150m
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  creationTimestamp: "2023-09-12T13:32:47Z"
  labels:
    endpointslice.kubernetes.io/skip-mirror: "true"
  name: kubernetes
  namespace: default
  resourceVersion: "31961"
  uid: bc170a67-018f-4490-a18c-811ebd3f3676
subsets:
- addresses:
  - ip: 10.0.46.47
  - ip: 10.0.7.159
  - ip: 10.0.77.20
  ports:
  - name: https
    port: 6443
    protocol: TCP
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -ojsonpath='{.subsets[].addresses[].ip}{"\n"}'
10.0.46.47
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc get endpoints -n default kubernetes -ojsonpath='{.subsets[].ports[].port}{"\n"}'
6443
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc project clusters-jie-test
Now using project "clusters-jie-test" on server "https://api.jiezhao-091201.qe.devcluster.openshift.com:6443".
jiezhao-mac:hypershift jiezhao$ 
jiezhao-mac:hypershift jiezhao$ oc -n clusters-jie-test rsh pod/router-78d47f4c69-2mvbp curl --connect-timeout 2 -Iks https://10.0.46.47:6443 -v 
* Rebuilt URL to: https://10.0.46.47:6443/
*   Trying 10.0.46.47...
* TCP_NODELAY set
* Connected to 10.0.46.47 (10.0.46.47) port 6443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Request CERT (13):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Certificate (11):
* TLSv1.3 (OUT), TLS handshake, [no content] (0):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=172.30.0.1
*  start date: Sep 12 13:35:51 2023 GMT
*  expire date: Oct 12 13:35:52 2023 GMT
*  issuer: OU=openshift; CN=kube-apiserver-service-network-signer
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* Using Stream ID: 1 (easy handle 0x55c5c46cb990)
* TLSv1.3 (OUT), TLS app data, [no content] (0):
> HEAD / HTTP/2
> Host: 10.0.46.47:6443
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN), TLS handshake, [no content] (0):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 2000)!
* TLSv1.3 (OUT), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
* TLSv1.3 (IN), TLS app data, [no content] (0):
< HTTP/2 403 
HTTP/2 403 
< audit-id: 82d5f3f7-6e5b-4bb5-b846-54df09aefb54
audit-id: 82d5f3f7-6e5b-4bb5-b846-54df09aefb54
< cache-control: no-cache, private
cache-control: no-cache, private
< content-type: application/json
content-type: application/json
< strict-transport-security: max-age=31536000; includeSubDomains; preload
strict-transport-security: max-age=31536000; includeSubDomains; preload
< x-content-type-options: nosniff
x-content-type-options: nosniff
< x-kubernetes-pf-flowschema-uid: 6edd6532-2d15-4d8d-9cea-4dcce99b881f
x-kubernetes-pf-flowschema-uid: 6edd6532-2d15-4d8d-9cea-4dcce99b881f
< x-kubernetes-pf-prioritylevel-uid: 4115bb59-a78d-42ab-9136-37529cf107e1
x-kubernetes-pf-prioritylevel-uid: 4115bb59-a78d-42ab-9136-37529cf107e1
< content-length: 218
content-length: 218
< date: Tue, 12 Sep 2023 16:05:02 GMT
date: Tue, 12 Sep 2023 16:05:02 GMT
< 
* Connection #0 to host 10.0.46.47 left intact
jiezhao-mac:hypershift jiezhao$

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/hypershift/pull/3010

Bug OCPBUGS-19416: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13173

Bug OCPBUGS-19535: machine-config-operator does not honor ICSP when fetching machine-os-content

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-13044~~. The following is the description of the original issue:
—
Description of problem:

During cluster installations/upgrades with an imageContentSourcePolicy in place but with access to quay.io, the ICSP is not honored to pull the machine-os-content image from a private registry.

Version-Release number of selected component (if applicable):

$ oc logs -n openshift-machine-config-operator ds/machine-config-daemon -c machine-config-daemon|head -1
Found 6 pods, using pod/machine-config-daemon-znknf
I0503 10:53:00.925942    2377 start.go:112] Version: v4.12.0-202304070941.p0.g87fedee.assembly.stream-dirty (87fedee690ae487f8ae044ac416000172c9576a5)

How reproducible:

100% in clusters with ICSP configured BUT with access to quay.io

Steps to Reproduce:

1. Create mirror repo:
$ cat <<EOF > /tmp/isc.yaml                                                    
kind: ImageSetConfiguration
apiVersion: mirror.openshift.io/v1alpha2
archiveSize: 4
storageConfig:
  registry:
    imageURL: quay.example.com/mirror/oc-mirror-metadata
    skipTLS: true
mirror:
  platform:
    channels:
    - name: stable-4.12
      type: ocp
      minVersion: 4.12.13
    graph: true
EOF
$ oc mirror --dest-skip-tls  --config=/tmp/isc.yaml docker://quay.example.com/mirror/oc-mirror-metadata
<...>
info: Mirroring completed in 2m27.91s (138.6MB/s)
Writing image mapping to oc-mirror-workspace/results-1683104229/mapping.txt
Writing UpdateService manifests to oc-mirror-workspace/results-1683104229
Writing ICSP manifests to oc-mirror-workspace/results-1683104229

2. Confirm machine-os-content digest:
$ oc adm release info 4.12.13 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a1660c8086ff85e569e10b3bc9db344e1e1f7530581d742ad98b670a81477b1b"
}
$ oc adm release info 4.12.14 -o jsonpath='{.references.spec.tags[?(@.name=="machine-os-content")].from}'|jq
{
  "kind": "DockerImage",
  "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ed68d04d720a83366626a11297a4f3c5761c0b44d02ef66fe4cbcc70a6854563"
}

3. Create 4.12.13 cluster with ICSP at install time:
$ grep imageContentSources -A6 ./install-config.yaml
imageContentSources:
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release
    source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
  - mirrors:
    - quay.example.com/mirror/oc-mirror-metadata/openshift/release-images
    source: quay.io/openshift-release-dev/ocp-release

Actual results:

1. After the installation is completed, no pulls for a166 (4.12.13-x86_64-machine-os-content) are logged in the Quay usage logs whereas e.g. digest 22d2 (4.12.13-x86_64-machine-os-images) are reported to be pulled from the mirror. 

2. After upgrading to 4.12.14 no pulls for ed68 (4.12.14-x86_64-machine-os-content) are logged in the mirror-registry while the image was pulled as part of `oc image extract` in the machine-config-daemon:

[core@master-1 ~]$ sudo less /var/log/pods/openshift-machine-config-operator_machine-config-daemon-7fnjz_e2a3de54-1355-44f9-a516-2f89d6c6ab8f/machine-config-daemon/0.log                        2023-05-03T10:51:43.308996195+00:00 stderr F I0503 10:51:43.308932   11290 run.go:19] Running: nice -- ionice -c 3 oc image extract -v 10 --path /:/run/mco-extensions/os-extensions-content-4035545447 --registry- config /var/lib/kubelet/config.json quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:ad48fe01f3e82584197797ce2151eecdfdcce67ae1096f06412e5ace416f66ce 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418008  184455 client_mirrored.go:174] Attempting to connect to quay.io/openshift-release-dev/ocp-v4.0-art-dev 2023-05-03T10:51:43.418211869+00:00 stderr F I0503 10:51:43.418174  184455 round_trippers.go:466] curl -v -XGET  -H "User-Agent: oc/4.12.0 (linux/amd64) kubernetes/31aa3e8" 'https://quay.io/v2/' 2023-05-03T10:51:43.419618513+00:00 stderr F I0503 10:51:43.419517  184455 round_trippers.go:495] HTTP Trace: DNS Lookup for quay.io resolved to [{34.206.15.82 } {54.209.210.231 } {52.5.187.29 } {52.3.168.193 }  {52.21.36.23 } {50.17.122.58 } {44.194.68.221 } {34.194.241.136 } {2600:1f18:483:cf01:ebba:a861:1150:e245 } {2600:1f18:483:cf02:40f9:477f:ea6b:8a2b } {2600:1f18:483:cf02:8601:2257:9919:cd9e } {2600:1f18:483:cf01 :8212:fcdc:2a2a:50a7 } {2600:1f18:483:cf00:915d:9d2f:fc1f:40a7 } {2600:1f18:483:cf02:7a8b:1901:f1cf:3ab3 } {2600:1f18:483:cf00:27e2:dfeb:a6c7:c4db } {2600:1f18:483:cf01:ca3f:d96e:196c:7867 }] 2023-05-03T10:51:43.429298245+00:00 stderr F I0503 10:51:43.429151  184455 round_trippers.go:510] HTTP Trace: Dial to tcp:34.206.15.82:443 succeed

Expected results:

All images are pulled from the location as configured in the ICSP.

Additional info:

https://github.com/openshift/machine-config-operator/pull/3932

Bug OCPBUGS-10137: Update 4.14 prometheus-operator-admission-webhook image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/222

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/222

Bug OCPBUGS-11738: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2452

Bug OCPBUGS-14995: Ingress operator performs spurious updates in response to API's defaulting of router deployment's router container's ports' hostPort field when using HostNetwork

View the Description View the linked PRs

Description of problem

When the ingress operator creates or updates a router deployment that specifies spec.template.spec.hostNetwork: true, the operator does not set spec.template.spec.containers[*].ports[*].hostPort. As a result, the API sets each port's hostPort field to the port's containerPort field value. The operator detects this as an external update and attempts to revert it. The operator should not update the deployment in response to API defaulting.

Version-Release number of selected component (if applicable)

I observed this in CI for OCP 4.14 and was able to reproduce the issue on OCP 4.11.37. The problematic code was added in https://github.com/openshift/cluster-ingress-operator/pull/694/commits/af653f9fa7368cf124e11b7ea4666bc40e601165 in OCP 4.11 to implement ~~NE-674~~.

How reproducible

Easily.

Steps to Reproduce

1. Create an IngressController that specifies the "HostNetwork" endpoint publishing strategy type:

oc create -f - <<EOF
apiVersion: operator.openshift.io/v1
kind: IngressController
metadata:
  name: example-hostnetwork
  namespace: openshift-ingress-operator
spec:
  domain: example.xyz
  endpointPublishingStrategy:
    type: HostNetwork
EOF

2. Check the ingress operator's logs:

oc -n openshift-ingress-operator logs -c ingress-operator deployments/ingress-operator

Actual results

The ingress operator logs "updated router deployment" multiple times for the "example-hostnetwork" IngressController, such as the following:

2023-06-15T02:11:47.229Z        INFO    operator.ingress_controller     ingress/deployment.go:131       updated router deployment       {"namespace": "openshift-ingress", "name": "router-example-hostnetwork", "diff": "  &v1.Deployment{\n  \tTypeMeta:   {},\n  \tObjectMeta: {Name: \"router-example-hostnetwork\", Namespace: \"openshift-ingress\", UID: \"d7c51022-460e-4962-8521-e00255f649c3\", ResourceVersion: \"3356177\", ...},\n  \tSpec: v1.DeploymentSpec{\n  \t\tReplicas: &2,\n  \t\tSelector: &{MatchLabels: {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"example-hostnetwork\"}},\n  \t\tTemplate: v1.PodTemplateSpec{\n  \t\t\tObjectMeta: {Labels: {\"ingresscontroller.operator.openshift.io/deployment-ingresscontroller\": \"example-hostnetwork\", \"ingresscontroller.operator.openshift.io/hash\": \"b7c697fd\"}, Annotations: {\"target.workload.openshift.io/management\": `{\"effect\": \"PreferredDuringScheduling\"}`, \"unsupported.do-not-use.openshift.io/override-liveness-grace-period-seconds\": \"10\"}},\n  \t\t\tSpec: v1.PodSpec{\n  \t\t\t\tVolumes: []v1.Volume{\n  \t\t\t\t\t{Name: \"default-certificate\", VolumeSource: {Secret: &{SecretName: \"router-certs-example-hostnetwork\", DefaultMode: &420}}},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"service-ca-bundle\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 16 identical fields\n  \t\t\t\t\t\t\tFC:        nil,\n  \t\t\t\t\t\t\tAzureFile: nil,\n  \t\t\t\t\t\t\tConfigMap: &v1.ConfigMapVolumeSource{\n  \t\t\t\t\t\t\t\tLocalObjectReference: {Name: \"service-ca-bundle\"},\n  \t\t\t\t\t\t\t\tItems:                {{Key: \"service-ca.crt\", Path: \"service-ca.crt\"}},\n- \t\t\t\t\t\t\t\tDefaultMode:          &420,\n+ \t\t\t\t\t\t\t\tDefaultMode:          nil,\n  \t\t\t\t\t\t\t\tOptional:             &false,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tVsphereVolume: nil,\n  \t\t\t\t\t\t\tQuobyte:       nil,\n  \t\t\t\t\t\t\t... // 8 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"stats-auth\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\t\tAWSElasticBlockStore: nil,\n  \t\t\t\t\t\t\tGitRepo:              nil,\n  \t\t\t\t\t\t\tSecret: &v1.SecretVolumeSource{\n  \t\t\t\t\t\t\t\tSecretName:  \"router-stats-example-hostnetwork\",\n  \t\t\t\t\t\t\t\tItems:       nil,\n- \t\t\t\t\t\t\t\tDefaultMode: &420,\n+ \t\t\t\t\t\t\t\tDefaultMode: nil,\n  \t\t\t\t\t\t\t\tOptional:    nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tNFS:   nil,\n  \t\t\t\t\t\t\tISCSI: nil,\n  \t\t\t\t\t\t\t... // 21 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t\t{\n  \t\t\t\t\t\tName: \"metrics-certs\",\n  \t\t\t\t\t\tVolumeSource: v1.VolumeSource{\n  \t\t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\t\tAWSElasticBlockStore: nil,\n  \t\t\t\t\t\t\tGitRepo:              nil,\n  \t\t\t\t\t\t\tSecret: &v1.SecretVolumeSource{\n  \t\t\t\t\t\t\t\tSecretName:  \"router-metrics-certs-example-hostnetwork\",\n  \t\t\t\t\t\t\t\tItems:       nil,\n- \t\t\t\t\t\t\t\tDefaultMode: &420,\n+ \t\t\t\t\t\t\t\tDefaultMode: nil,\n  \t\t\t\t\t\t\t\tOptional:    nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tNFS:   nil,\n  \t\t\t\t\t\t\tISCSI: nil,\n  \t\t\t\t\t\t\t... // 21 identical fields\n  \t\t\t\t\t\t},\n  \t\t\t\t\t},\n  \t\t\t\t},\n  \t\t\t\tInitContainers: nil,\n  \t\t\t\tContainers: []v1.Container{\n  \t\t\t\t\t{\n  \t\t\t\t\t\t... // 3 identical fields\n  \t\t\t\t\t\tArgs:       nil,\n  \t\t\t\t\t\tWorkingDir: \"\",\n  \t\t\t\t\t\tPorts: []v1.ContainerPort{\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"http\",\n- \t\t\t\t\t\t\t\tHostPort:      80,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 80,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"https\",\n- \t\t\t\t\t\t\t\tHostPort:      443,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 443,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t{\n  \t\t\t\t\t\t\t\tName:          \"metrics\",\n- \t\t\t\t\t\t\t\tHostPort:      1936,\n+ \t\t\t\t\t\t\t\tHostPort:      0,\n  \t\t\t\t\t\t\t\tContainerPort: 1936,\n  \t\t\t\t\t\t\t\tProtocol:      \"TCP\",\n  \t\t\t\t\t\t\t\tHostIP:        \"\",\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tEnvFrom:       nil,\n  \t\t\t\t\t\tEnv:           {{Name: \"DEFAULT_CERTIFICATE_DIR\", Value: \"/etc/pki/tls/private\"}, {Name: \"DEFAULT_DESTINATION_CA_PATH\", Value: \"/var/run/configmaps/service-ca/service-ca.crt\"}, {Name: \"RELOAD_INTERVAL\", Value: \"5s\"}, {Name: \"ROUTER_ALLOW_WILDCARD_ROUTES\", Value: \"false\"}, ...},\n  \t\t\t\t\t\tResources:     {Requests: {s\"cpu\": {i: {...}, s: \"100m\", Format: \"DecimalSI\"}, s\"memory\": {i: {...}, Format: \"BinarySI\"}}},\n  \t\t\t\t\t\tVolumeMounts:  {{Name: \"default-certificate\", ReadOnly: true, MountPath: \"/etc/pki/tls/private\"}, {Name: \"service-ca-bundle\", ReadOnly: true, MountPath: \"/var/run/configmaps/service-ca\"}, {Name: \"stats-auth\", ReadOnly: true, MountPath: \"/var/lib/haproxy/conf/metrics-auth\"}, {Name: \"metrics-certs\", ReadOnly: true, MountPath: \"/etc/pki/tls/metrics-certs\"}},\n  \t\t\t\t\t\tVolumeDevices: nil,\n  \t\t\t\t\t\tLivenessProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n- \t\t\t\t\t\t\tPeriodSeconds:                 10,\n+ \t\t\t\t\t\t\tPeriodSeconds:                 0,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:              0,\n- \t\t\t\t\t\t\tFailureThreshold:              3,\n+ \t\t\t\t\t\t\tFailureThreshold:              0,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tReadinessProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz/ready\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n- \t\t\t\t\t\t\tPeriodSeconds:                 10,\n+ \t\t\t\t\t\t\tPeriodSeconds:                 0,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:       
      0,\n- \t\t\t\t\t\t\tFailureThreshold:              3,\n+ \t\t\t\t\t\t\tFailureThreshold:              0,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tStartupProbe: &v1.Probe{\n  \t\t\t\t\t\t\tProbeHandler: v1.ProbeHandler{\n  \t\t\t\t\t\t\t\tExec: nil,\n  \t\t\t\t\t\t\t\tHTTPGet: &v1.HTTPGetAction{\n  \t\t\t\t\t\t\t\t\tPath:        \"/healthz/ready\",\n  \t\t\t\t\t\t\t\t\tPort:        {IntVal: 1936},\n  \t\t\t\t\t\t\t\t\tHost:        \"localhost\",\n- \t\t\t\t\t\t\t\t\tScheme:      \"HTTP\",\n+ \t\t\t\t\t\t\t\t\tScheme:      \"\",\n  \t\t\t\t\t\t\t\t\tHTTPHeaders: nil,\n  \t\t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\t\tTCPSocket: nil,\n  \t\t\t\t\t\t\t\tGRPC:      nil,\n  \t\t\t\t\t\t\t},\n  \t\t\t\t\t\t\tInitialDelaySeconds:           0,\n  \t\t\t\t\t\t\tTimeoutSeconds:                1,\n  \t\t\t\t\t\t\tPeriodSeconds:                 1,\n- \t\t\t\t\t\t\tSuccessThreshold:              1,\n+ \t\t\t\t\t\t\tSuccessThreshold:              0,\n  \t\t\t\t\t\t\tFailureThreshold:              120,\n  \t\t\t\t\t\t\tTerminationGracePeriodSeconds: nil,\n  \t\t\t\t\t\t},\n  \t\t\t\t\t\tLifecycle:              nil,\n  \t\t\t\t\t\tTerminationMessagePath: \"/dev/termination-log\",\n  \t\t\t\t\t\t... // 6 identical fields\n  \t\t\t\t\t},\n  \t\t\t\t},\n  \t\t\t\tEphemeralContainers: nil,\n  \t\t\t\tRestartPolicy:       \"Always\",\n  \t\t\t\t... // 31 identical fields\n  \t\t\t},\n  \t\t},\n  \t\tStrategy:        {Type: \"RollingUpdate\", RollingUpdate: &{MaxUnavailable: &{Type: 1, StrVal: \"25%\"}, MaxSurge: &{}}},\n  \t\tMinReadySeconds: 30,\n  \t\t... // 3 identical fields\n  \t},\n  \tStatus: {ObservedGeneration: 1, Replicas: 2, UpdatedReplicas:
2, UnavailableReplicas: 2, ...},\n  }\n"}

Note the following in the diff:

                                                Ports: []v1.ContainerPort{                                                                                                                                                                                                                                                                                                                                                               
                                                        {                                                                                                                                                                                                                                                                                                                                                                                
                                                                Name:          \"http\",                                                                                                                                                                                                                                                                                                                                                 
-                                                               HostPort:      80,                                                                                                                                                                                                                                                                                                                                                       
+                                                               HostPort:      0,                                                                                                                                                                                                                                                                                                                                                        
                                                                ContainerPort: 80,                                                                                                                                                                                                                                                                                                                                                       
                                                                Protocol:      \"TCP\",                                                                                                                                                                                                                                                                                                                                                  
                                                                HostIP:        \"\",                                                                                                                                                                                                                                                                                                                                                     
                                                        },                                                                                                                                                                                                                                                                                                                                                                               
                                                        {
                                                                Name:          \"https\",
-                                                               HostPort:      443,
+                                                               HostPort:      0,
                                                                ContainerPort: 443,
                                                                Protocol:      \"TCP\",
                                                                HostIP:        \"\",
                                                        },
                                                        {
                                                                Name:          \"metrics\",
-                                                               HostPort:      1936,
+                                                               HostPort:      0,
                                                                ContainerPort: 1936,
                                                                Protocol:      \"TCP\",
                                                                HostIP:        \"\",
                                                        },
                                                },

Expected results

The operator should ignore updates by the API that only set default values. The operator should not perform these unnecessary updates to the router deployment.

https://github.com/openshift/cluster-ingress-operator/pull/947

Bug MGMT-11949: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5398

Task MGMT-15491: Validate vSphere disk.EnableUUID also when baremetal platform selected

View the Description View the linked PRs

Run isVSphereDiskUUIDEnabled validation also on baremetal platform installation.

From the description of https://issues.redhat.com/browse/OCPBUGS-16955:

Storage team has observed that if disk.EnableUUID flag is not enabled on vSphere VMs in any platform, including baremetal, then no symlinks are generated in /dev/disk/by-id for attached disks.

Installing ODF via LSO or something on such a platform results in somewhat fragile installation because disks themselves could be renamed on reboot and since no permanent ids exists for disks, the PVs could become invalid.

We should update baremetal installs - https://docs.openshift.com/container-platform/4.13/installing/installing_bare_metal/installing-bare-metal.html to always enable disk.EnableUUID in both IPI and UPI installs.

https://github.com/openshift/assisted-service/pull/5416

Bug OCPBUGS-10842: CI fails on "[sig-auth][Feature:SCC][Early] should not have pod creation failures during install" for azure-file-csi-driver pods

View the Description View the linked PRs

Description of problem

CI is flaky because of test failures such as the following:

[sig-auth][Feature:SCC][Early] should not have pod creation failures during install [Suite:openshift/conformance/parallel]
Run #0: Failed
{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 1 pods failed before test on SCC errors
Error creating: pods "azure-file-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[10]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.initContainers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/azure-file-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Ginkgo exit error 1: exit with code 1}

Run #1: Failed
{  fail [github.com/openshift/origin/test/extended/authorization/scc.go:69]: 1 pods failed before test on SCC errors
Error creating: pods "azure-file-csi-driver-node-" is forbidden: unable to validate against any security context constraint: [provider "anyuid": Forbidden: not usable by user or serviceaccount, provider restricted-v2: .spec.securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[1]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[2]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[3]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[4]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[5]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[6]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[7]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[9]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.volumes[10]: Invalid value: "hostPath": hostPath volumes are not allowed to be used, spec.initContainers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.initContainers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[0].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[0].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[0].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[1].securityContext.privileged: Invalid value: true: Privileged containers are not allowed, spec.containers[1].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[1].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, spec.containers[2].securityContext.hostNetwork: Invalid value: true: Host network is not allowed to be used, spec.containers[2].securityContext.containers[0].hostPort: Invalid value: 10302: Host ports are not allowed to be used, provider "restricted": Forbidden: not usable by user or serviceaccount, provider "nonroot-v2": Forbidden: not usable by user or serviceaccount, provider "nonroot": Forbidden: not usable by user or serviceaccount, provider "hostmount-anyuid": Forbidden: not usable by user or serviceaccount, provider "machine-api-termination-handler": Forbidden: not usable by user or serviceaccount, provider "hostnetwork-v2": Forbidden: not usable by user or serviceaccount, provider "hostnetwork": Forbidden: not usable by user or serviceaccount, provider "hostaccess": Forbidden: not usable by user or serviceaccount, provider "privileged": Forbidden: not usable by user or serviceaccount] for DaemonSet.apps/v1/azure-file-csi-driver-node -n openshift-cluster-csi-drivers happened 12 times

Ginkgo exit error 1: exit with code 1}

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/901/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-ovn/1638557668689842176. Search.ci has additional similar errors.

Version-Release number of selected component (if applicable)

I have seen these failures in 4.14 CI jobs.

How reproducible

Presently, search.ci shows the following stats for the past two days:

Found in 0.00% of runs (0.01% of failures) across 131399 total runs and 7623 jobs (19.50% failed) in 1.01s

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check search.ci: https://search.ci.openshift.org/?search=pods+%22azure-file-csi-driver-%28controller%7Cnode%29-%22+is+forbidden&maxAge=168h&context=1&type=bug%2Bissue%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Actual results

CI fails.

Expected results

CI passes, or fails on some other test failure, and the failures don't show up in search.ci.

Bug OCPBUGS-2968: OpenShift will not start if no registry exists, even if images are loaded into containers-store

View the Description View the linked PRs

Description of problem:

Even in environments when containers are manually loaded into containers-store, services will fail because they are written to always pull images priory to starting the container (or checking podman image to see if the image exists first).

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6536

Bug OCPBUGS-11038: GCP: add europe-west12 region to the survey as supported region

View the Description View the linked PRs

Description of problem:

Backport support starting in 4.12.z to a new GCP region europe-west12

Version-Release number of selected component (if applicable):

4.12.z and 4.13.z

How reproducible:

Always

Steps to Reproduce:

1. Use openhift-install to deploy OCP in europe-west12

Actual results:

europe-west12 is not available as a supported region in the user survey

Expected results:

europe-west12 to be available as a supported region in the user survey

Additional info:

https://github.com/openshift/installer/pull/7033

Bug OCPBUGS-21175: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1142

Bug OCPBUGS-8687: MAPO failing to retrieve flavour information after rotating credentials

View the Description View the linked PRs

Description of problem:

When running a cluster on application credentials, this event appears repeatedly:

ns/openshift-machine-api machineset/nhydri0d-f8dcc-kzcwf-worker-0 hmsg/173228e527 - pathological/true reason/ReconcileError could not find information for "ci.m1.xlarge"

Version-Release number of selected component (if applicable):

How reproducible:

Happens in the CI (https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_release/33330/rehearse-33330-periodic-ci-shiftstack-shiftstack-ci-main-periodic-4.13-e2e-openstack-ovn-serial/1633149670878351360).

Steps to Reproduce:

1. On a living cluster, rotate the OpenStack cloud credentials
2. Invalidate the previous credentials
3. Watch the machine-api events (`oc -n openshift-machine-api get event`). A `Warning` type of issue could not find information for "name-of-the-flavour" will appear.

If the cluster was installed using a password that you can't invalidate:
1. Rotate the cloud credentials to application credentials
2. Restart MAPO (`oc -n openshift-machine-api get pods -o NAME | xargs -r oc -n openshift-machine-api delete`)
3. Rotate cloud credentials again
4. Revoke the first application credentials you set
5. Finally watch the events (`oc -n openshift-machine-api get event`)

The event signals that MAPO wasn't able to update flavour information on the MachineSet status.

Actual results:

Expected results:

No issue detecting the flavour details

Additional info:

Offending code likely around this line: https://github.com/openshift/machine-api-provider-openstack/blob/bcb08a7835c08d20606d75757228fd03fbb20dab/pkg/machineset/controller.go#L116

https://github.com/openshift/machine-api-provider-openstack/pull/63

Bug OCPBUGS-14053: Critical Alert Rules do not have runbook url

View the Description View the linked PRs

Description of problem:

Critical Alert Rules do not have runbook url

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

This bug is being raised by Openshift Monitoring team as part of effort to detect invalid Alert Rules in OCP.

1.  Check details of MultipleDefaultStorageClasses Alert Rule
2.
3.

Actual results:

The Alert Rule MultipleDefaultStorageClasses has Critical Severity, but does not have runbook_url annotation.

Expected results:

All Critical Alert Rules must have runbbok_url annotation

Additional info:

Critical Alerts must have a runbook, please refer to style guide at https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md#style-guide 

The runbooks are located at github.com/openshift/runbooks

To resolve the bug, 
- Add runbooks for the relevant Alerts at github.com/openshift/runbooks
- Add the link to the runbook in the Alert annotation 'runbook_url'
- Remove the exception in the origin test, added in PR https://github.com/openshift/origin/pull/27933

https://github.com/openshift/cluster-storage-operator/pull/382

Bug OCPBUGS-7699: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-11620: [Openshift Pipelines] Stop option for pipelinerun is not working

View the Description View the linked PRs

Description of problem:

Stop option for pipelinerun is not working

Version-Release number of selected component (if applicable):

Openshift Pipelines 1.9.x

How reproducible:

Always

Steps to Reproduce:

1. Create a pipeline and start it
2. From Actions dropdown select  stop option

Actual results:

Pipelinerun is not getting cancelled

Expected results:

Pipelinerun should get cancelled

Additional info:

https://github.com/openshift/console/pull/13020

Bug OCPBUGS-21339: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/ibm-vpc-block-csi-driver-operator/pull/81

Bug OCPBUGS-10172: Update 4.14 ose-prometheus-adapter image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/k8s-prometheus-adapter/pull/68

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/k8s-prometheus-adapter/pull/68

Bug OCPBUGS-16395: UPI Installation Failure: cluster operator control-plane-machine-set is not available

View the Description View the linked PRs

Description of problem:

OCP 4.14 installation fails.

Waiting for the UPI installation to complete using the wait-for, ends with a CO error:
```
$ openshift-install wait-for install-complete --log-level=debug

level=error msg=failed to initialize the cluster: Cluster operator control-plane-machine-set is not available
```

```
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          122m    Unable to apply 4.14.0-0.nightly-2023-07-18-085740: the cluster operator control-plane-machine-set is not available
```

```
$ oc get co | grep control-plane-machine-set
control-plane-machine-set                  4.14.0-0.nightly-2023-07-18-085740   False       False         True       6h47m   Missing 3 available replica(s)
```

Version-Release number of selected component (if applicable):

Openshift on Openstack
OCP 4.14.0-0.nightly-2023-07-18-085740
RHOS-16.2-RHEL-8-20230413.n.1
UPI installation

How reproducible:

Always

Steps to Reproduce:

Run the UPI openshift installation

Actual results:

UPI installation fail

Expected results:

UPI installation pass

Additional info:

Last UPI successful installation in D/S CI used: 4.14.0-0.nightly-2023-07-05-191022
control-plane-machine-set-operator log:

$ oc logs -n openshift-machine-api control-plane-machine-set-operator-5cbb7f68cc-h5f4p | tail
E0719 14:20:52.645504       1 controller.go:649]  "msg"="Observed unmanaged control plane nodes" "error"="found unmanaged control plane nodes, the following node(s) do not have associated machines: ostest-c2drn-master-0, ostest-c2drn-master-1, ostest-c2drn-master-2" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e" "unmanagedNodes"="ostest-c2drn-master-0,ostest-c2drn-master-1,ostest-c2drn-master-2"
I0719 14:20:52.645530       1 controller.go:268]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e"
I0719 14:20:52.667462       1 controller.go:212]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="1984ddf9-506f-4d10-88e5-0787b305484e"
I0719 14:20:52.668013       1 controller.go:156]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"
I0719 14:20:52.668718       1 controller.go:121]  "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="e80d898c-9a8d-4774-8f22-fb464be45758"
I0719 14:20:52.668780       1 controller.go:142]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="e80d898c-9a8d-4774-8f22-fb464be45758"
I0719 14:20:52.669005       1 status.go:119]  "msg"="Observed Machine Configuration" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "observedGeneration"=1 "readyReplicas"=0 "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce" "replicas"=0 "unavailableReplicas"=3 "updatedReplicas"=0
E0719 14:20:52.669237       1 controller.go:649]  "msg"="Observed unmanaged control plane nodes" "error"="found unmanaged control plane nodes, the following node(s) do not have associated machines: ostest-c2drn-master-0, ostest-c2drn-master-1, ostest-c2drn-master-2" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce" "unmanagedNodes"="ostest-c2drn-master-0,ostest-c2drn-master-1,ostest-c2drn-master-2"
I0719 14:20:52.669267       1 controller.go:268]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"
I0719 14:20:52.669842       1 controller.go:212]  "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="3f095b75-21af-4475-b0fd-25052e8c3bce"

The nodes are up:

[cloud-user@installer-host ~]$ oc get nodes
NAME                    STATUS   ROLES                  AGE     VERSION
ostest-c2drn-master-0   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-master-1   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-master-2   Ready    control-plane,master   6h55m   v1.27.3+4aaeaec
ostest-c2drn-worker-0   Ready    worker                 6h36m   v1.27.3+4aaeaec
ostest-c2drn-worker-1   Ready    worker                 6h35m   v1.27.3+4aaeaec
ostest-c2drn-worker-2   Ready    worker                 6h36m   v1.27.3+4aaeaec

https://github.com/openshift/installer/blob/release-4.14/upi/openstack/control-plane.yaml should be changed?

https://github.com/openshift/installer/pull/7351

Bug OCPBUGS-12869: Fix nmstate related unit tests

View the Description View the linked PRs

Description of problem:

Due to a CI configuration issue (lack of nmstatectl in the image), the current CI unit-test job skips silently those unit tests requiring nmstatectl.

Version-Release number of selected component (if applicable):

How reproducible:

hack/go-test.sh

Steps to Reproduce:

1.
2.
3.

Actual results:

Unit tests are failing

Expected results:

No failure

Additional info:

https://github.com/openshift/installer/pull/7089

Bug OCPBUGS-17064: Workers machinesets not created if replicas set to 0

View the Description View the linked PRs

Description of problem:

No MachineSet is created for workers if replicas == 0

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

replicas: 0 in install-config for workers

Steps to Reproduce:

1. Deploy a cluster with 0 worker
2. After deployment, list MachineSets
3. Zero can be found

Actual results:

No MachineSet found:
No resources found in openshift-machine-api namespace.

Expected results:

A worker MachineSet should have been created like before.

Additional info:

We broke it during CPMS integration.

https://github.com/openshift/installer/pull/7380

Bug OCPBUGS-10568: migrate to using Lease for leader election

View the Description View the linked PRs

Description of problem:

library-go should use Lease for leader election by default. 
In 4.10 we switched from configmaps to configmapsleases, now we can switch to leases

change library-go to use lease by default, we already have an open pr for that: https://github.com/openshift/library-go/pull/1448 

once the pr merges, we should revendor library-go for:
- kas operator
- oas operator
- etcd operator
- kcm operator
- openshift controller manager operator
- scheduler operator
- auth operator
- cluster policy controller

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Bug OCPBUGS-26006: Bump to kubernetes 1.27.9

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.9:

Changelog:
v1.27.9: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1278

https://github.com/openshift/kubernetes/pull/1838

Bug OCPBUGS-16207: [CORS-2602]Installer should check whether the specified custom security groups exceeded the maximum number allowed

View the Description View the linked PRs

Description of problem:

According to https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html, the default Security groups number per network interface is 5 and could be 16 at most, so we better to have some pre-check on the number of provided custom security groups.

When it's more than 15(since the maximum is 16, but installer will also create one ${var.cluster_id}-master-sg/${var.cluster_id}-worker-sg), installer should quit and warn user about this.

Version-Release number of selected component (if applicable):

registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-07-11-092038

How reproducible:

Always

Steps to Reproduce:

1. Set 16 Security groups IDs in compute.platform.aws.additionalSecurityGroupIDs

  compute:
 - architecture: amd64
   hyperthreading: Enabled
   name: worker
   platform:
     aws:
       additionalSecurityGroupIDs:
       - sg-06e63a6ad731c10cc
       - sg-054614d4f4eb5751d
       - sg-05c4fe202c8e2c28c
       - sg-0c948fa8b85bf4af1
       - sg-0cfb0c91c0b48f0de
       - sg-0eff6077ca727c921
       - sg-0d2d1f41f1ac9801c
       - sg-047c67d5decb64563
       - sg-0ee63f164c0ab8b04
       - sg-033ff80fa12e43c7f
       - sg-0ccad43754d9652cd
       - sg-04e4cbca2b5d50c3a
       - sg-0d133411fdcb0a4e0
       - sg-0b2b0e0d515b2f561
       - sg-045fde620b3e702da
       - sg-07e0493a65749973c
   replicas: 3

2. The installation failed due to workers couldn't be provisioned.

Actual results:

[root@preserve-gpei-worker k_files]# oc get machines -A
NAMESPACE               NAME                                       PHASE     TYPE         REGION      ZONE         AGE
openshift-machine-api   gpei-0613g-wp7zw-master-0                  Running   m6i.xlarge   us-west-2   us-west-2a   66m
openshift-machine-api   gpei-0613g-wp7zw-master-1                  Running   m6i.xlarge   us-west-2   us-west-2b   66m
openshift-machine-api   gpei-0613g-wp7zw-master-2                  Running   m6i.xlarge   us-west-2   us-west-2a   66m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2a-7rszc   Failed                                          62m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2a-pwnvp   Failed                                          62m
openshift-machine-api   gpei-0613g-wp7zw-worker-us-west-2b-n2cs9   Failed                                          62m
[root@preserve-gpei-worker k_files]# oc describe machine gpei-0613g-wp7zw-worker-us-west-2b-n2cs9 -n openshift-machine-api
Name:         gpei-0613g-wp7zw-worker-us-west-2b-n2cs9
..
Spec:
  Lifecycle Hooks:
  Metadata:
  Provider Spec:
    Value:
      Ami:
        Id:         ami-01bfc200595c748a1
      API Version:  machine.openshift.io/v1beta1
      Block Devices:
        Ebs:
      Metadata Service Options:
      Placement:
        Availability Zone:  us-west-2b
        Region:             us-west-2
      Security Groups:
        Filters:
          Name:  tag:Name
          Values:
            gpei-0613g-wp7zw-worker-sg
        Id:  sg-033ff80fa12e43c7f
        Id:  sg-045fde620b3e702da
        Id:  sg-047c67d5decb64563
        Id:  sg-04e4cbca2b5d50c3a
        Id:  sg-054614d4f4eb5751d
        Id:  sg-05c4fe202c8e2c28c
        Id:  sg-06e63a6ad731c10cc
        Id:  sg-07e0493a65749973c
        Id:  sg-0b2b0e0d515b2f561
        Id:  sg-0c948fa8b85bf4af1
        Id:  sg-0ccad43754d9652cd
        Id:  sg-0cfb0c91c0b48f0de
        Id:  sg-0d133411fdcb0a4e0
        Id:  sg-0d2d1f41f1ac9801c
        Id:  sg-0ee63f164c0ab8b04
        Id:  sg-0eff6077ca727c921
      Subnet:
        Id:  subnet-0641814f00311bd9c
      Tags:
        Name:   kubernetes.io/cluster/gpei-0613g-wp7zw
        Value:  owned
      User Data Secret:
        Name:  worker-user-data
Status:
  Conditions:
    Last Transition Time:  2023-07-13T09:58:02Z
    Status:                True
    Type:                  Drainable
    Last Transition Time:  2023-07-13T09:58:02Z
    Message:               Instance has not been created
    Reason:                InstanceNotCreated
    Severity:              Warning
    Status:                False
    Type:                  InstanceExists
    Last Transition Time:  2023-07-13T09:58:02Z
    Status:                True
    Type:                  Terminable
  Error Message:           error launching instance: You have exceeded the maximum number of security groups allowed per network interface.

Expected results:

Installer could abort and prompt the provided custom security group number exceeded the maximum number allowed.

Additional info:

https://github.com/openshift/installer/pull/7345

Bug OCPBUGS-17184: Update 4.14 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/12

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/12

Bug OCPBUGS-18904: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-olm/pull/605

Bug OCPBUGS-19649: Introduce a node-identity with a validating webhook

View the linked PRs

https://github.com/openshift/cluster-network-operator/pull/2011

Bug OCPBUGS-8707: OVN IPSec - does not create IPSec tunnels

View the Description View the linked PRs

Description of problem:

Enabling IPSec doesn't result in IPsec tunnels being created

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Deploy & Enable IPSec

Steps to Reproduce:

1.
2.
3.

Actual results:

000 Total IPsec connections: loaded 0, active 0
000  
000 State Information: DDoS cookies not required, Accepting new IKE connections
000 IKE SAs: total(0), half-open(0), open(0), authenticated(0), anonymous(0)
000 IPsec SAs: total(0), authenticated(0), anonymous(0)

Expected results:

Active connections > 0

Additional info:

✘-1 ~/code/k8s-netperf [more-meta L|✚ 4…37⚑ 1] 
06:49 $ oc -n openshift-ovn-kubernetes -c nbdb rsh ovnkube-master-qw4zv \ovn-nbctl --no-leader-only get nb_global . ipsec
true

https://github.com/openshift/cluster-network-operator/pull/1727

Bug OCPBUGS-15852: Single node cannot be installed if etcd appears in the hostname

View the Description View the linked PRs

Description of problem:

Users cannot install single-node-openshift if the hostname contains the word etcd

Version-Release number of selected component (if applicable):

Probably since 4.8

How reproducible:

100%

Steps to Reproduce:

1. Install SNO with either Assisted or BIP
2. Make sure node hostname is etcd-1 (e.g. via DHCP hostname)

Actual results:

Bootstrap phase never ends

Expected results:

Bootstrap phase should complete successfully

Additional info:

This code is the likely culprit - it uses a naive way to check if etcd is running, accidentally capturing the node name (which contains etcd) in the crictl output as "evidence" that etcd is still running, so it never completes.

See ~~OCPBUGS-15826~~ (aka ~~AITRIAGE-7677~~)

https://github.com/openshift/installer/pull/7304

Bug OCPBUGS-22978: [IBMCloud] Add IPI support for new region eu-es (Madrid)

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19398~~. The following is the description of the original issue:
—
Description of problem:

IPI on IBM Cloud does not currently support the new eu-es region

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

1. Create install-config.yaml for IBM Cloud, per docs, using eu-es region
2. Create the manifests (or cluster) using IPI

Actual results:

level=error msg=failed to fetch Master Machines: failed to load asset "Install Config": failed to create install config: invalid "install-config.yaml" file: platform.ibmcloud.region: Unsupported value: "eu-es": supported values: "us-south", "us-east", "jp-tok", "jp-osa", "au-syd", "ca-tor", "eu-gb", "eu-de", "br-sao"

Expected results:

Successful IBM Cloud OCP cluster in eu-es

Additional info:

IBM Cloud has started testing a potential fix, in eu-es to confirm supported cluster types (Public, Private, BYON) all work properly in eu-es

https://github.com/openshift/installer/pull/7684

Bug OCPBUGS-7782: build regression on 4.13: ERROR: bash-5.0.11-r1.post-install: script exited with error 127

View the Description View the linked PRs

Description of problem:

A build which works on 4.12 errored out on 4.13.

Version-Release number of selected component (if applicable):

oc --context build02 get clusterversion version
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-ec.3   True        False         4d2h    Cluster version is 4.13.0-ec.3

How reproducible:

Always

Steps to Reproduce:

1. oc new-project hongkliu-test
2. oc create is test-is --as system:admin
3. oc apply -f test-bc.yaml # the file is in the attachment

Actual results:

oc --context build02 logs test-bc-5-build
Defaulted container "docker-build" out of: docker-build, manage-dockerfile (init)
time="2023-02-20T19:13:38Z" level=info msg="Not using native diff for overlay, this may cause degraded performance for building images: kernel has CONFIG_OVERLAY_FS_REDIRECT_DIR enabled"
I0220 19:13:38.405163       1 defaults.go:112] Defaulting to storage driver "overlay" with options [mountopt=metacopy=on].
Caching blobs under "/var/cache/blobs".Pulling image image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08 ...
Trying to pull image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08...
Getting image source signatures
Copying blob sha256:aa8ae8202b42d1c70c3a7f65680eabc1c562a29227549b9a1b33dc03943b20d2
Copying blob sha256:31326f32ac37d5657248df0a6aa251ec6a416dab712ca1236ea40ca14322a22c
Copying blob sha256:b21786fe7c0d7561a5b89ca15d8a1c3e4ea673820cd79f1308bdfd8eb3cb7142
Copying blob sha256:68296e6645b26c3af42fa29b6eb7f5befa3d8131ef710c25ec082d6a8606080d
Copying blob sha256:6b1c37303e2d886834dab68eb5a42257daeca973bbef3c5d04c4868f7613c3d3
Copying blob sha256:cbdbe7a5bc2a134ca8ec91be58565ec07d037386d1f1d8385412d224deafca08
Copying blob sha256:46cf6a1965a3b9810a80236b62c42d8cdcd6fb75f9b58d1b438db5736bcf2669
Copying config sha256:9aefe4e59d3204741583c5b585d4d984573df8ff751c879c8a69379c168cb592
Writing manifest to image destination
Storing signatures
Adding transient rw bind mount for /run/secrets/rhsm
STEP 1/4: FROM image-registry.openshift-image-registry.svc:5000/ci/html-proofer@sha256:684aae4e929e596f7042c34a3604c81137860187305f775c2380774bda4b6b08
STEP 2/4: RUN apk add --no-cache bash
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.11/community/x86_64/APKINDEX.tar.gz
(1/1) Installing bash (5.0.11-r1)
Executing bash-5.0.11-r1.post-install
ERROR: bash-5.0.11-r1.post-install: script exited with error 127
Executing busybox-1.31.1-r9.trigger
ERROR: busybox-1.31.1-r9.trigger: script exited with error 127
1 error; 21 MiB in 40 packages
error: build error: building at STEP "RUN apk add --no-cache bash": while running runtime: exit status 1

Expected results:

Additional info:

Run the build on build01 (4.12.4) and it works fine.

oc --context build01 get clusterversion version
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.4    True        False         2d11h   Cluster version is 4.12.4

https://github.com/openshift/builder/pull/335

Bug OCPBUGS-13095: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-monitoring-operator/pull/1960

Bug OCPBUGS-17283: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-azure/pull/286

Bug OCPBUGS-17433: Worker Latency Profile not changing kubelet nodeStatusUpdateFrequency

View the Description View the linked PRs

Description of problem:

I created a cluster with _workerLatencyProfile: LowUpdateSlowReaction_, then I edited the latencyProfile to MediumUpdateAverageReaction using documentation linked and this test case document below. Once I switched I waited for KubeControllerManager and KubeAPIServer to stop progressing/complete and noticed the nodeStatusUpdateFrequency under /etc/kubernetes/kubelet.conf does not change as expected

https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.kf4qxogy9r6
Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-07-31-181848

How reproducible:

100%

Steps to Reproduce:

1. Create cluster with LowUpdateSlowReaction manifest: Example: https://docs.google.com/document/d/19dPIE4WFxVc3ldu-hNoXiOkjBCQrHC6I7wfyaUyTDqw/edit#heading=h.22najgyaj9lh
2. Validate values of low update profile components 

$ oc debug node/<worker-node-name>
$ chroot /host 
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "1m0s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 5m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        Default-unreachable-toleration-seconds:
        - "60"
3. *oc edit nodes.config/cluster*
spec: 
  workerLatencyProfile: MediumUpdateAverageReaction
4. Wait for components to complete using 

oc get KubeControllerManager -o yaml | grep -i workerlatency -A 5 -B 5
and 
oc get KubeAPIServer -o yaml | grep -i workerlatency -A 5 -B 5

5. Validate medium component values, hitting error here

Actual results:

% oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 2m0s
prubenda@prubenda1-mac lrc % oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        default-unreachable-toleration-seconds:
        - "60"
sh-5.1# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "1m0s",

Expected results:

$ oc debug node/<worker-node-name>
$ chroot /host 
$ sh-4.4# cat /etc/kubernetes/kubelet.conf | grep nodeStatusUpdateFrequency 
  "nodeStatusUpdateFrequency": "20s",
$ oc get KubeControllerManager -o yaml | grep -A 1 node-monitor
        node-monitor-grace-period:
        - 2m0s
$ oc get KubeAPIServer -o yaml | grep -A 1 default-
        default-not-ready-toleration-seconds:
        - "60"
        default-unreachable-toleration-seconds:
        - "60"

Additional info:

In the documentation it states that workers will go disabled while the change is being applied and I never saw that occur

https://github.com/openshift/machine-config-operator/pull/3846

Bug OCPBUGS-21558: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/20

Bug OCPBUGS-7431: openshift-marketplace pods with no 'controller: true' ownerReferences

View the Description View the linked PRs

Description of problem:

Porting rhbz#2057740 to Jira. Pods without a controller: true entry in ownerReferences are not gracefully drained by the autoscaler (and potentially other drain-library drainers). Checking a recent 4.13 CI run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.13-e2e-aws-ovn/1625150492994703360/artifacts/e2e-aws-ovn/gather-extra/artifacts/pods.json | jq -r '.items[].metadata | select([(.ownerReferences // [])[] | select(.controller)] | length == 0) | .namespace + " " + .name + " " + (.ownerReferences | tostring)' | grep -v '^\(openshift-etcd\|openshift-kube-apiserver\|openshift-kube-controller-manager\|openshift-kube-scheduler\) ' 
openshift-marketplace certified-operators-fnm5z [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"4eb36072-7c56-4663-9b5a-fd23cee85432"}]
openshift-marketplace community-operators-nrfl6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"0e164593-5656-4592-9915-1a5367a6a548"}]
openshift-marketplace redhat-marketplace-7j7k9 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"14b910c4-0e45-4188-ab57-671070b6a9f1"}]
openshift-marketplace redhat-operators-hxhxw [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"ca9028e5-affb-4537-81f1-15e3a5129c6e"}]

Version-Release number of selected component (if applicable):

At least 4.11 and 4.13 (above). Likely all OpenShift 4.y which have had these openshift-marketplace pods.

How reproducible:

100%

Steps to Reproduce:

1. Launch a cluster.
2. Inspect the openshift-marketplace pods with: oc -n openshift-marketplace get -o json pods | jq -r '.items[].metadata | select(.namespace == "openshift-marketplace" and (([.ownerReferences[] | select(.controller == true)]) | length) == 0) | .name + " " + (.ownerReferences | tostring)'

Actual results:

certified-operators-fnm5z [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"certified-operators","uid":"4eb36072-7c56-4663-9b5a-fd23cee85432"}]
community-operators-nrfl6 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"community-operators","uid":"0e164593-5656-4592-9915-1a5367a6a548"}]
redhat-marketplace-7j7k9 [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-marketplace","uid":"14b910c4-0e45-4188-ab57-671070b6a9f1"}]
redhat-operators-hxhxw [{"apiVersion":"operators.coreos.com/v1alpha1","blockOwnerDeletion":false,"controller":false,"kind":"CatalogSource","name":"redhat-operators","uid":"ca9028e5-affb-4537-81f1-15e3a5129c6e"}]

Expected results:

No output.

Additional info:

Figuring out which resource to list as the controller is tricky, but there are workarounds, including pointing at the triggering resource or a ClusterOperator as the controller.

https://github.com/openshift/operator-framework-olm/pull/460

Bug OCPBUGS-10149: Update 4.14 ose-ibmcloud-machine-controllers image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/machine-api-provider-ibmcloud/pull/18

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/machine-api-provider-ibmcloud/pull/18

Bug OCPBUGS-10798: must-gather does not contain CSIStorageCapacity

View the Description View the linked PRs

Description of problem:

CSI storage capacity tracking is GA since Kubernetes 1.24, yet must-gather does not collect CSIStorageCapacity objects. It would be useful for single node clusters with LVMO, but other clusters could benefit from it too.

Version-Release number of selected component (if applicable):

4.11.0

How reproducible:

always

Steps to Reproduce:

1. oc adm must-gather

Actual results:

Output does not contain CSIStorageCapacity objects

Expected results:

Output contains CSIStorageCapacity objects

Additional info:

We should go through all new additions to storage APIs (storage.k8s.io/v1) and any missing items.

https://github.com/openshift/must-gather/pull/356

Bug OCPBUGS-6343: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/openshift-state-metrics/pull/100

Bug OCPBUGS-7978: Bump FCOS image to latest stable

View the Description View the linked PRs

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6902

Bug OCPBUGS-18348: Add deprecated alert for DeploymentConfig

View the Description View the linked PRs

Description of problem:

The UI should add an alert for deprecating DeploymentConfig in 4.14

Version-Release number of selected component (if applicable):

pre-merge

How reproducible:

Always

Steps to Reproduce:

1. 
2.
3.

Actual results:

The alert is missing

Expected results:

The alert should exist

Additional info:

https://github.com/openshift/console/pull/12968

Bug OCPBUGS-20354: On an SNO with Telco DU profile must-gather perf-node-gather-daemonset fails: Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace does not allow the workload type management

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19761~~. The following is the description of the original issue:
—
Description of problem:

When running must-gather against an SNO with Telco DU profile the perf-node-gather-daemonset seems to not be able to start with the error below:

 Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

must-gather shows it's retrying for 300s and reports that performance data collection was complete even though the daemonset pod didn't come up.

[must-gather-nhbgr] POD 2023-09-26T10:15:39.591582116Z Waiting for performance profile collector pods to become ready: 1
[..]
[must-gather-nhbgr] POD 2023-09-26T10:21:07.108893075Z Waiting for performance profile collector pods to become ready: 300
[must-gather-nhbgr] POD 2023-09-26T10:21:08.473217146Z daemonset.apps "perf-node-gather-daemonset" deleted
[must-gather-nhbgr] POD 2023-09-26T10:21:08.480906220Z INFO: Node performance data collection complete.

Version-Release number of selected component (if applicable):

4.14.0-rc.2

How reproducible:

100%

Steps to Reproduce:

1. Deploy SNO with Telco DU profile
2. Run oc adm must-gather

Actual results:

performance data collection doesn't run because daemonset cannot be scheduled.

Expected results:

performance data collection runs.

Additional info:

DaemonSet describe:

oc -n openshift-must-gather-sbhml describe ds
Name:           perf-node-gather-daemonset
Selector:       name=perf-node-gather-daemonset
Node-Selector:  <none>
Labels:         <none>
Annotations:    deprecated.daemonset.template.generation: 1
Desired Number of Nodes Scheduled: 1
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       name=perf-node-gather-daemonset
  Annotations:  target.workload.openshift.io/management: {"effect": "PreferredDuringScheduling"}
  Containers:
   node-probe:
    Image:      registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/openshift-release-dev@sha256:2af2c135f69f162ed8e0cede609ddbd207d71a3c7bd49e9af3fcbb16737aa25a
    Port:       <none>
    Host Port:  <none>
    Command:
      /bin/bash
      -c
      echo ok > /tmp/healthy && sleep INF
    Limits:
      cpu:     100m
      memory:  256Mi
    Requests:
      cpu:        100m
      memory:     256Mi
    Readiness:    exec [cat /tmp/healthy] delay=5s timeout=1s period=5s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /host/podresources from podres (rw)
      /host/proc from proc (ro)
      /host/sys from sys (ro)
      /lib/modules from lib-modules (ro)
  Volumes:
   sys:
    Type:          HostPath (bare host directory volume)
    Path:          /sys
    HostPathType:  Directory
   proc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:  Directory
   lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:  Directory
   podres:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pod-resources
    HostPathType:  Directory
Events:
  Type     Reason        Age                     From                  Message
  ----     ------        ----                    ----                  -------
  Warning  FailedCreate  2m37s (x16 over 5m21s)  daemonset-controller  Error creating: pods "perf-node-gather-daemonset-" is forbidden: autoscaling.openshift.io/ManagementCPUsOverride the pod namespace "openshift-must-gather-sbhml" does not allow the workload type management

https://github.com/openshift/must-gather/pull/388

Bug OCPBUGS-3986: PromQL queries of the ""API Performance" dasboard can overload Thanos queriers

View the Description View the linked PRs

Description of problem:

A customer has reported that the Thanos querier pods would be OOM-killed when loading the API performance dashboard with large time ranges (e.g. >= 1 week)

Version-Release number of selected component (if applicable):

4.10

How reproducible:

Always for the customer

Steps to Reproduce:

1. Open the "API performance" dashboard in the admin console.
2. Select a time range of 2 weeks.
3.

Actual results:

The dashboard fails to refresh and the thanos-query pods are killed.

Expected results:

The dashboard loads without error.

Additional info:

The issue arises for the customer because they have very large clusters (hundreds of nodes) which generate lots of metrics.
In practice the queries executed by the dashboard are costly because they access lots of series (probably > tens of thousands). To make it more efficient, the "upstream" dashboard from kubernetes-monitoring/kubernetes-mixin uses recording rules [1] instead of raw queries. While it decreases a bit the accuracy (one can only distinguish between read & write API requests), it's the only solution to avoid overloading the Thanos query endpoint.

[1] https://github.com/kubernetes-monitoring/kubernetes-mixin/blob/05a58f765eda05902d4f7dd22098a2b870f7ca1e/dashboards/apiserver.libsonnet#L50-L75

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1484

Bug OCPBUGS-21379: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/39

Bug OCPBUGS-21611: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/oc/pull/1579

Bug OCPBUGS-24269: OAuthClient 'openshift-cli-client' is missing for HyperShift Guest Clusters causing `oc login --web` fails

View the Description View the linked PRs

Description of problem:

The oc login --web command fails when used with a Hypershift Guest Cluster. The web console returns an error message stating that the client is unauthorized to request a token using this method.
Error Message:
{  "error": "unauthorized_client",  
"error_description": "The client is not authorized to request a token using this method."
}

OCP does not have such issue.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-11-21-212406
4.14
4.15

How reproducible:

always

Steps to Reproduce:

1.Install a Hypershift Guest Cluster.
2. Configure the Any OpenID Identity Provider for the Hypershift Guest Cluster eg. https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-62511
3. Execute the oc login --web $URL command.

4. After adding openshift-cli-client manually it's works
# cat oauth.yaml
apiVersion: oauth.openshift.io/v1
grantMethod: auto
kind: OAuthClient
metadata:
  name: openshift-cli-client
redirectURIs:
- http://127.0.0.1/callback,http://[::1]/callback
respondWithChallenges: false

# oc create -f oauth.yaml
oauthclient.oauth.openshift.io/openshift-cli-client created

$ oc login --web $URL
Opening login URL in the default browser: https://oauth-clusters-hypershift-ci-28276.apps.xxxxxxxxxxxxxxxx.com:443/oauth/authorize?client_id=openshift-cli-client&code_challenge=mixnB73nR_yzL58e0lEd4soQH1sn0GjvWEfnX4PNrCg&code_challenge_method=S256&redirect_uri=http%3A%2F%2F127.0.0.1%3A45055%2Fcallback&response_type=code
Login successful.

Actual results:

Step 3: The web login process fails and redirects to an error page displaying the error message "error_description": "The client is not authorized to request a token using this method."

Expected results:

OAuthClient 'openshift-cli-client' should not be missing for HyperShift Guest Clusters so that the oc login --web $URL command should work without any issues. As OCP 4.13+ has the OAuthClient 'openshift-cli-client' by default.

Additional info:

The issue can be tracked at the following URL: https://issues.redhat.com/browse/AUTH-444

Root Cause :
Default 'openshift-cli-client' OAuthClient should not be missing for HyperShift Guest Clusters.

https://github.com/openshift/hypershift/pull/3272

Bug OCPBUGS-25397: [release-4.14] Node Overview Pane not displaying

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-25140~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-24408. The following is the description of the original issue:
—
Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13446

Bug OCPBUGS-11801: agent-tui is failing to start when using libnmstate.2

View the Description View the linked PRs

Description of problem:

Now that the bug to include libnmstate.2.2.x has been resolved (https://issues.redhat.com/browse/OCPBUGS-11659) we are seeing a boot issue in which agent-tui can't start. It looks like it is failing to find the symlink libnmstate.2 as when its run directly we see 
$ /usr/local/bin/agent-tui
/usr/local/bin/agent-tui: error while loading shared libraries: libnmstate.so.2: cannot open shared object file: No such file or directory

This results neither the console or ssh available in bootstrap which makes debugging difficult. However it does not affect the installation as we still get a successful install. The bootstrap screenshots are attached.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7095

Bug OCPBUGS-11835: Hypershift does not use probes on openshift-route-controller-manager and openshift-controller-manager

View the Description View the linked PRs

Description of problem:

Hypershift does not utilize existing liveness and readiness probes on openshift-route-controller-manager and openshift-controller-manager.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster using Hypershift
2.Look at openshift-route-controller-manager and openshift-controller-manager yaml manifests

Actual results:

No probes defined for pods of those two deployments

Expected results:

Probes should be defined because the service implement them

Additional info:

This is the result of a security review for 4.12 Hypershift, original investigation can be found https://github.ibm.com/alchemy-containers/armada-update/issues/4117#issuecomment-53149378

https://github.com/openshift/hypershift/pull/2430

Bug OCPBUGS-13140: Maximum Number Of Egress IPs Supported

View the Description View the linked PRs

Description of problem:

According to the Red Hat documentation https://docs.openshift.com/container-platform/4.12/networking/ovn_kubernetes_network_provider/configuring-egress-ips-ovn.html, the maximum number of IP aliases per node is 10 - "Per node, the maximum number of IP aliases, both IPv4 and IPv6, is 10.".

Looking at the code base, the number of allowed IPs is calculated as
Capacity = defaultGCPPrivateIPCapacity (which is set to 10) + cloudPrivateIPsCount (that is number of available IPs from the range) - currentIPv4Usage (number of assigned v4 IPs) - currentIPv6Usage (number of assigned v6 IPs)
https://github.com/openshift/cloud-network-config-controller/blob/master/pkg/cloudprovider/gcp.go#L18-L22

Speaking to GCP, they support up to 100 alias IP ranges (not IPs) per vNIC.

Can Red Hat confirm
1) If there is a limitation of 10 from OCP and why?
2) If there isn't a limit, what is the maximum number of egress IPs that could be supported per node?

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Case: 03487893
It is one of the most highlighted bug from our customer.

https://github.com/openshift/cloud-network-config-controller/pull/110

Bug OCPBUGS-14862: Active Endpoint Connection blocks cluster uninstallation

View the Description View the linked PRs

Description of problem:

For unknown reasons, the management cluster AWS endpoint service sometimes has an active connection leftover. This blocks the uninstallation, as the AWS endpoint service cannot be deleted before this connection is rejected.

Version-Release number of selected component (if applicable):

4.12.z,4.13.z,4.14.z

How reproducible:

Irregular

Steps to Reproduce:

1.
2.
3.

Actual results:

AWSEndpointService cannot be deleted by the hypershift operator, the uninstallation is stuck

Expected results:

There are no leftover active AWSEndpoint connections when deleting the AWSEndpointService and it can be deleted properly.

OR

Hypershift operator rejects active endpoint connections when trying to delete AWSEndpointServices from the management cluster aws account

Additional info:

Added mustgathers in comment.

https://github.com/openshift/hypershift/pull/2700

Bug OCPBUGS-21717: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-network-operator/pull/2122

Bug MGMT-13997: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/assisted-service/pull/5233

Bug MGMT-14721: Cluster fails to prepare when providing local ICSP in install config in SaaS

View the Description View the linked PRs

Description of the problem:

When providing an ICSP in the install config for caching images locally when also using the SaaS the cluster fails to prepare for installation because oc adm release extract is trying to use the ICSP from the install config.

How reproducible:

100% on a fresh deploy, but if the installer cache is already warmed up 0%

Steps to reproduce:

1. Deploy fresh replicas to the SaaS environment

2. Create a cluster

3. Override install config and add ICSP content for an inaccessable (from the SaaS) registry

4. Install cluster

Actual results:

Cluster fails to prepare with an error like:

Failed to prepare the installation due to an unexpected error: failed generating install config for cluster f3e55b14-297d-453b-8ef4-953caebefc67: failed to get installer path: command 'oc adm release extract --command=openshift-install --to=/data/install-config-generate/installercache/quay.io/openshift-release-dev/ocp-release:4.13.0-x86_64 --insecure=false --icsp-file=/tmp/icsp-file1525063401 quay.io/openshift-release-dev/ocp-release:4.13.0-x86_64 --registry-config=/tmp/registry-config882468533' exited with non-zero exit code 1: warning: --icsp-file only applies to images referenced by digest and will be ignored for tags error: unable to read image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81be8aec46465412abbef5f1ec252ee4a17b043e82d31feac13d25a8a215a2c9: unauthorized: access to the requested resource is not authorized . Please retry later

Expected results:

Installer image is pulled successfully.

Additional Information

This seems to have been introduced in https://github.com/openshift/assisted-service/pull/4115 when we started pulling ICSP information from the install config.

https://github.com/openshift/assisted-service/pull/5245

Bug OCPBUGS-15359: Operator installed Namespace dropdown should always be enabled and user can choose from the full set of namespaces

View the Description View the linked PRs

Description of problem:

During an operator installation with the Installation mode set to all namespaces, the "Installed Namespace" dropdown selection is restricted to "openshift-operators" or another specific namespace, if one is recommended by the operator owners.

With to recent* change to allow non-latest operator version installs, users should be allowed to select any namespace to install a globally installed operator.

Related info:
Operators can now be installed on non-latest versions with the merge of * https://github.com/openshift/console/pull/12743 They require a manual approval and because of the way InstallPlan upgrades work, this effects all operators installed that namespace.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12975

Bug OCPBUGS-17285: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver/pull/33

Bug OCPBUGS-10076: Update 4.14 openshift-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/openshift-state-metrics/pull/95

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/openshift-state-metrics/pull/95

Bug OCPBUGS-14813: Update OWNERS and OWNERS_ALIASES in external-snapshotter repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-snapshotter/pull/102

Bug OCPBUGS-18297: Ironic: sqlite3.OperationalError: database is locked

View the Description View the linked PRs

Description of problem:

4.14-e2e-metal-ipi-sdn-bm jobs are failing with

2023-08-29 15:43:27.066 1 ERROR ironic.api.method [None req-00977b71-1b61-4452-8f6c-a43a47b1e92e - - - - - -] Server-side error: "<Future at 0x7fe7b2b86250 state=finished raised OperationalError>". Detail: 
Traceback (most recent call last):
File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/base.py", line 1089, in _commit_impl
self.engine.dialect.do_commit(self.connection)
File "/usr/lib64/python3.9/site-packages/sqlalchemy/engine/default.py", line 686, in do_commit
dbapi_connection.commit()
sqlite3.OperationalError: database is locked

https://github.com/openshift/ironic-image/pull/396

Bug OCPBUGS-17497: openshift-tests unable to run on OCP cluster on Power platform

View the Description View the linked PRs

Description of problem:

E2E test suite is getting failed with below error -

Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed extracting "/usr/bin/k8s-tests" from "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f98d9998691052cb8049f806f8c1dc9a6bac189c10c33af9addd631eedfb5528": exit status 1
No manifest filename passed

Version-Release number of selected component (if applicable):

4.14

How reproducible:

So far with 4.14 clusters on Power

Steps to Reproduce:

1. Deploy 4.14 cluster on Power
2. Run e2e test suite from - https://github.com/openshift/origin
3. Monitor e2e

Actual results:

E2E test failed

Expected results:

E2E should pass

Additional info:

./openshift-tests run -f ./test-suite.txt -o /tmp/conformance-parallel-out.txt
warning: KUBE_TEST_REPO_LIST may not be set when using openshift-tests and will be ignored
openshift-tests version: v4.1.0-6960-gd9cf51f
  Aug  9 00:48:21.959: INFO: Enabling in-tree volume drivers
Attempting to pull tests from external binary...
Falling back to built-in suite, failed reading external test suites: unable to extract k8s-tests binary: failed extracting "/usr/bin/k8s-tests" from "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f98d9998691052cb8049f806f8c1dc9a6bac189c10c33af9addd631eedfb5528": exit status 1
creating a TCP service service-test with type=LoadBalancer in namespace e2e-service-lb-test-bvmbl
  Aug  9 00:48:35.424: INFO: Waiting up to 15m0s for service "service-test" to have a LoadBalancer
  Aug  9 00:48:36.272: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new started responding to GET requests over new connections
  Aug  9 00:48:36.272: INFO: ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused started responding to GET requests over reused connections
  Aug  9 00:48:36.310: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/new started responding to GET requests over new connections
  Aug  9 00:48:36.310: INFO: ns/openshift-console route/console disruption/ingress-to-console connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:07.507: INFO: disruption/ci-cluster-network-liveness connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:07.507: INFO: disruption/ci-cluster-network-liveness connection/new started responding to GET requests over new connections
Starting SimultaneousPodIPController
  I0809 01:04:37.551879  134117 shared_informer.go:311] Waiting for caches to sync for SimultaneousPodIPController
  Aug  9 01:04:37.558: INFO: ns/openshift-image-registry route/test-disruption-reused disruption/image-registry connection/reused started responding to GET requests over reused connections
  Aug  9 01:04:37.624: INFO: disruption/cache-kube-api connection/new started responding to GET requests over new connections
  E0809 01:04:37.719406  134117 shared_informer.go:314] unable to sync caches for SimultaneousPodIPControllerSuite run returned error: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition
disruption/kube-api connection/new producer sampler context is done
disruption/cache-kube-api connection/reused producer sampler context is done
disruption/oauth-api connection/new producer sampler context is done
disruption/oauth-api connection/reused producer sampler context is done
ERRO[0975] disruption sample failed: context canceled    auditID=464fb276-71b0-48bf-8fb4-3099ae37cedf backend=oauth-api type=reused
disruption/cache-kube-api connection/new producer sampler context is done
disruption/openshift-api connection/reused producer sampler context is done
disruption/cache-openshift-api connection/reused producer sampler context is done
ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new producer sampler context is done
ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/reused producer sampler context is done
ns/openshift-console route/console disruption/ingress-to-console connection/new producer sampler context is done
disruption/ci-cluster-network-liveness connection/reused producer sampler context is done
disruption/ci-cluster-network-liveness connection/new producer sampler context is done
ns/openshift-image-registry route/test-disruption-new disruption/image-registry connection/new producer sampler context is done
ns/openshift-image-registry route/test-disruption-reused disruption/image-registry connection/reused producer sampler context is done
ns/openshift-console route/console disruption/ingress-to-console connection/reused producer sampler context is done
disruption/kube-api connection/reused producer sampler context is done
disruption/openshift-api connection/new producer sampler context is done
disruption/cache-openshift-api connection/new producer sampler context is done
disruption/cache-oauth-api connection/reused producer sampler context is done

disruption/cache-oauth-api connection/new producer sampler context is done
Shutting down SimultaneousPodIPController
SimultaneousPodIPController shut down
No manifest filename passed
error running options: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the conditionerror: error waiting for load balancer: timed out waiting for service "service-test" to have a load balancer: timed out waiting for the condition

https://github.com/openshift/origin/pull/28180

Bug OCPBUGS-9982: SCOS bootstrap should skip pivot when root is not writable

View the Description View the linked PRs

Description of problem:

In assisted-installer flow bootkube service is started on Live ISO, so root FS is read-only. OKD installer attempts to pivot the booted OS to machine-os-content via `rpm-ostree rebase`. This is not necessary since we're already using SCOS in Live ISO.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/6965

Bug OCPBUGS-13372: Missing error check on sysctl whitelist test

View the Description View the linked PRs

Description of problem:

The test for updating the sysctl whitelist fails to check the error returned when the pod running state is verified.

Test is always passing. We failed to detect a bug in the cluster network operator for the allowlist controller.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27914

Bug OCPBUGS-13579: Rebase components to k8s v0.27.*

View the Description View the linked PRs

Some repositories require bugzilla/valid-bug label present. Complement to https://issues.redhat.com/browse/WRKLDS-700.

Bug OCPBUGS-20049: Agent-based install on vSphere with multiple workers fails

View the Description View the linked PRs

Description of problem:

Agent-based install on vSphere with multiple workers fails

Version-Release number of selected component (if applicable):

4.13.4

How reproducible:

Always

Steps to Reproduce:

1. Create agent-config, install-config for 3 master, 3+ worker cluster
2. Create Agent ISO image
3. Boot targets from Agent ISO

Actual results:

Deployment hangs waiting on cluster operators

Expected results:

Deployment completes

Additional info:

Multiple pods cannot start due to tainted nodes:"4 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}"

https://github.com/openshift/assisted-installer/pull/753

Bug OCPBUGS-18406: Builds navigation item is missing in Developer perspective

View the Description View the linked PRs

Description of problem:
Builds navigation item is missing in Developer perspective

Version-Release number of selected component (if applicable):
4.14.0

How reproducible:
Always

Steps to Reproduce:

Open the developer perspective on a cluster with BuildConfigs (default)

Actual results:
"Builds" is missing as a navigation item below "Search".

Expected results:
"Builds" navigation item should be displayed again when BuildConfigs CRD is available.

Additional info:
Might be dropped with PR https://github.com/openshift/console/pull/13097

https://github.com/openshift/console/pull/13124

Bug OCPBUGS-19686: when ovn ipsec pod stop/restart it kills pluto preventing further IPsec IKE communication

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19494~~. The following is the description of the original issue:
—
Description of problem:

ipsec container kills pluto even if that was started by systemd

Version-Release number of selected component (if applicable):

on any 4.14 nightly

How reproducible:

every time

Steps to Reproduce:

1. enable N-S ipsec
2. enable E-W IPsec
3. kill/stop/delete one of the ipsec-host pods

Actual results:

pluto is killed on that host

Expected results:

pluto keeps running

Additional info:

https://github.com/yuvalk/cluster-network-operator/blob/37d1cc72f4f6cd999046bd487a705e6da31301a5/bindata/network/ovn-kubernetes/common/ipsec-host.yaml#L235
this should be removed

https://github.com/openshift/cluster-network-operator/pull/2029

Bug OCPBUGS-7249: Machine and respective Node should indicate proper zones

View the Description View the linked PRs

Description of problem:

Machine and respective Node should indicate proper zones, but machine doesn’t indicate proper zones on multiple vCenter zones cluster

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-02-07-064924

How reproducible:

always

Steps to Reproduce:

1.Create a multiple vCenter zones cluster 

sh-4.4$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.nightly-2023-02-07-064924   True        False         58m     Cluster version is 4.13.0-0.nightly-2023-02-07-064924
sh-4.4$ oc get machine
NAME                           PHASE     TYPE   REGION    ZONE   AGE
jima15b-x4584-master-0         Running          us-east          88m
jima15b-x4584-master-1         Running          us-east          88m
jima15b-x4584-master-2         Running          us-west          88m
jima15b-x4584-worker-0-26hml   Running          us-east          81m
jima15b-x4584-worker-1-zljp8   Running          us-east          81m
jima15b-x4584-worker-2-kkdzf   Running          us-west          81m

2.Check machine labels and node labels 
sh-4.4$ oc get machine jima15b-x4584-worker-0-26hml -oyaml 
apiVersion: machine.openshift.io/v1beta1
kind: Machine
metadata:
  annotations:
    machine.openshift.io/instance-state: poweredOn
  creationTimestamp: "2023-02-09T02:28:03Z"
  finalizers:
  - machine.machine.openshift.io
  generateName: jima15b-x4584-worker-0-
  generation: 2
  labels:
    machine.openshift.io/cluster-api-cluster: jima15b-x4584
    machine.openshift.io/cluster-api-machine-role: worker
    machine.openshift.io/cluster-api-machine-type: worker
    machine.openshift.io/cluster-api-machineset: jima15b-x4584-worker-0
    machine.openshift.io/region: us-east
    machine.openshift.io/zone: ""
  name: jima15b-x4584-worker-0-26hml
  namespace: openshift-machine-api

sh-4.4$ oc get node jima15b-x4584-worker-0-26hml --show-labels
NAME                           STATUS   ROLES    AGE    VERSION           LABELS
jima15b-x4584-worker-0-26hml   Ready    worker   9m4s   v1.26.0+9eb81c2   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east,failure-domain.beta.kubernetes.io/zone=us-east-1a,kubernetes.io/arch=amd64,kubernetes.io/hostname=jima15b-x4584-worker-0-26hml,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=vsphere-vm.cpu-4.mem-16gb.os-unknown,node.openshift.io/os_id=rhcos,topology.csi.vmware.com/openshift-region=us-east,topology.csi.vmware.com/openshift-zone=us-east-1a,topology.kubernetes.io/region=us-east,topology.kubernetes.io/zone=us-east-1a

Actual results:

Machine doesn’t indicate proper zone, it's machine.openshift.io/zone: ""

Expected results:

Machine should indicate proper zone

Additional info:

Discussed here https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1675848293159359

https://github.com/openshift/machine-api-operator/pull/1126

Bug OCPBUGS-6418: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-ibmcloud/pull/48

Bug OCPBUGS-10846: CI fails on TestClientTLS

View the Description View the linked PRs

Description of problem

CI is flaky because the TestClientTLS test fails.

Version-Release number of selected component (if applicable)

I have seen these failures in 4.13 and 4.14 CI jobs.

How reproducible

Presently, search.ci reports the following stats for the past 14 days:

Found in 16.07% of runs (20.93% of failures) across 56 total runs and 13 jobs (76.79% failed) in 185ms

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check https://search.ci.openshift.org/?search=FAIL%3A+TestAll%2Fparallel%2FTestClientTLS&maxAge=336h&context=1&type=all&name=cluster-ingress-operator&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job.

Actual results

The test fails:

=== RUN   TestAll/parallel/TestClientTLS
=== PAUSE TestAll/parallel/TestClientTLS
=== CONT  TestAll/parallel/TestClientTLS
=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [8 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [313 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [313 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:24 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=683c60a6110214134bed475edc895cb9; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:24 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=eb40064e54af58007f579a6c82f2bcd7; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [802 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:56:25 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=104beed63d6a19782a5559400bd972b6; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown CA (560):
        { [2 bytes data]
        * OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [8 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown (628):
        { [2 bytes data]
        * OpenSSL SSL_read: error:1409445C:SSL routines:ssl3_read_bytes:tlsv13 alert certificate required, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:1409445C:SSL routines:ssl3_read_bytes:tlsv13 alert certificate required, errno 0

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:57:00 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=683c60a6110214134bed475edc895cb9; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        Healthcheck requested
        200

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [802 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
        { [1097 bytes data]
        * TLSv1.3 (IN), TLS app data, [no content] (0):
        { [1 bytes data]
        < HTTP/1.1 200 OK
        < x-request-port: 8080
        < date: Wed, 22 Mar 2023 18:57:00 GMT
        < content-length: 22
        < content-type: text/plain; charset=utf-8
        < set-cookie: c6e529a6ab19a530fd4f1cceb91c08a9=eb40064e54af58007f579a6c82f2bcd7; path=/; HttpOnly; Secure; SameSite=None
        < cache-control: private
        <
        { [22 bytes data]

        * Connection #0 to host canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com left intact

=== CONT  TestAll/parallel/TestClientTLS
        stdout:
        000

        stderr:
        * Added canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com:443:172.30.53.236 to DNS cache
        * Rebuilt URL to: https://canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com/
        * Hostname canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com was found in DNS cache
        *   Trying 172.30.53.236...
        * TCP_NODELAY set
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed

        * ALPN, offering h2
        * ALPN, offering http/1.1
        * successfully set certificate verify locations:
        *   CAfile: /etc/pki/tls/certs/ca-bundle.crt
          CApath: none
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Client hello (1):
        } [512 bytes data]
        * TLSv1.3 (IN), TLS handshake, Server hello (2):
        { [122 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
        { [10 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Request CERT (13):
        { [82 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Certificate (11):
        { [1763 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, CERT verify (15):
        { [264 bytes data]
        * TLSv1.3 (IN), TLS handshake, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS handshake, Finished (20):
        { [36 bytes data]
        * TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Certificate (11):
        } [799 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, CERT verify (15):
        } [264 bytes data]
        * TLSv1.3 (OUT), TLS handshake, [no content] (0):
        } [1 bytes data]
        * TLSv1.3 (OUT), TLS handshake, Finished (20):
        } [36 bytes data]
        * SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
        * ALPN, server did not agree to a protocol
        * Server certificate:
        *  subject: CN=*.client-tls.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        *  start date: Mar 22 18:55:46 2023 GMT
        *  expire date: Mar 21 18:55:47 2025 GMT
        *  issuer: CN=ingress-operator@1679509964
        *  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
        } [5 bytes data]
        * TLSv1.3 (OUT), TLS app data, [no content] (0):
        } [1 bytes data]
        > GET / HTTP/1.1
        > Host: canary-openshift-ingress-canary.apps.ci-op-21xplx9n-43abb.origin-ci-int-aws.dev.rhcloud.com
        > User-Agent: curl/7.61.1
        > Accept: */*
        >
        { [5 bytes data]
        * TLSv1.3 (IN), TLS alert, [no content] (0):
        { [1 bytes data]
        * TLSv1.3 (IN), TLS alert, unknown CA (560):
        { [2 bytes data]
        * OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

        * Closing connection 0
        curl: (56) OpenSSL SSL_read: error:14094418:SSL routines:ssl3_read_bytes:tlsv1 alert unknown ca, errno 0

=== CONT  TestAll/parallel/TestClientTLS
--- FAIL: TestAll (1538.53s)
    --- FAIL: TestAll/parallel (0.00s)
        --- FAIL: TestAll/parallel/TestClientTLS (123.10s)

Expected results

CI passes, or it fails on a different test.

Additional info

I saw that TestClientTLS failed on the test case with no client certificate and ClientCertificatePolicy set to "Required". My best guess is that the test is racy and is hitting a terminating router pod. The test uses waitForDeploymentComplete to wait until all new pods are available, but perhaps waitForDeploymentComplete should also wait until all old pods are terminated.

https://github.com/openshift/cluster-ingress-operator/pull/904

Bug OCPBUGS-15773: The upgrade Helm Release tab in OpenShift GUI Developer console is not refreshing with updated values.

View the Description View the linked PRs

Description of problem:

The upgrade Helm Release tab in OpenShift GUI Developer console is not refreshing with updated values.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

100%

Steps to Reproduce:

1. Add below Helm chart repository from CLI

~~~
apiVersion: helm.openshift.io/v1beta1
kind: HelmChartRepository
metadata:
  name: prometheus-community
spec:
  connectionConfig:
    url: 'https://prometheus-community.github.io/helm-charts'
  name: prometheus-community
~~~
2. Goto GUI and select Developer console --> +Add --> Developer Catalog --> Helm Chart --> Select Prometheus Helm chart --> Install Helm chart --> From dropdown of chart version select 22.3.0 --> Install

3. You will see the image tag as v0.63.0
~~~
    image:
      digest: ''
      pullPolicy: IfNotPresent
      repository: quay.io/prometheus-operator/prometheus-config-reloader
      tag: v0.63.0
~~~ 
4. Once that is installed Goto Helm --> Helm Releases --> Prometheus --> Upgrade --> From dropdown of chart version select 22.4.0 --> the page does not refresh with new value of the tag.

~~~
    image:
      digest: ''
      pullPolicy: IfNotPresent
      repository: quay.io/prometheus-operator/prometheus-config-reloader
      tag: v0.63.0
~~~

NOTE: The same steps before installing the helm chart, when we select different versions the value is being updated.
Goto GUI and select Developer console --> +Add --> Developer Catalog --> Helm Chart --> Select Prometheus Helm chart --> Install Helm chart --> From dropdown of chart version select 22.3.0 --> Now select different chart version like 22.7.0 or 22.4.0

Actual results:

The The yaml view of Upgrade Helm Release tab shows the values of older chart version.

Expected results:

The yaml view of Upgrade Helm Release tab should contain latest values as per selected chart version.

Additional info:

https://github.com/openshift/console/pull/12966

Bug OCPBUGS-17653: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/router/pull/505

Bug OCPBUGS-9081: Destroy OCP takes huge time with bigger swift container

View the Description View the linked PRs

Version:

$ openshift-install version

./openshift-install 4.9.11
built from commit 4ee186bb88bf6aeef8ccffd0b5d4e98e9ddd895f
release image quay.io/openshift-release-dev/ocp-release@sha256:0f72e150329db15279a1aeda1286c9495258a4892bc5bf1bf5bb89942cd432de
release architecture amd64

Platform: Openstack

install type: IPI

What happened?

Image streams using the swift container to store the images, after running so many image streams I am able to see the huge number of objects in the swift container if I destroy the cluster now, it takes huge time based on the size of the swift container

What did you expect to happen?

The destroy script should clean the resources in some reasonable time

How to reproduce it (as minimally and precisely as possible)?

deploy OCP, run some workload which creates a lot of image streams and destroy the cluster, it will take a lot of time to complete the destroy cmd

Anything else we need to know?

here is the output of the swift state cmd and the time it took to complete the destroy job

$ swift stat vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Account: AUTH_2b4d979a2a9e4cf88b2509e9c5e0e232
Container: vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Objects: 723756
Bytes: 652448740473
Read ACL:
Write ACL:
Sync To:
Sync Key:
Meta Name: vlan609-26jxm-image-registry-nseyclolgfgxoaiysrlejlhvoklcawbxt
Meta Openshiftclusterid: vlan609-26jxm
Content-Type: application/json; charset=utf-8
X-Timestamp: 1640248399.77606
Last-Modified: Thu, 23 Dec 2021 08:34:48 GMT
Accept-Ranges: bytes
X-Storage-Policy: Policy-0
X-Trans-Id: txb0717d5198e344a5a095d-0061c93b70
X-Openstack-Request-Id: txb0717d5198e344a5a095d-0061c93b70

Time took to complete the destroy: 6455.42s

Bug OCPBUGS-11668: Installed Operators page only lets project admin delete CSV

View the Description View the linked PRs

Description of problem:

When listing installed operators, we attempt to list subscriptions in all namespaces in order to associate subscriptions/csvs. This prevents users without cluster scope list priveleges from seeing subscriptions on this page, which makes the uninstall action unavailable.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Install an namespaced operator
2. Log in as a user with project admin permissions where the operator was installed
3. Visit the installed operators page
4. Click the kebab menu for the operator from step 1

Actual results:

The only action available is to delete the CSV

Expected results:

The "Uninstall Operator" and "Edit Subscriptions" actions should show since the user has permission to view, edit, delete Subscription resources in this namespace.

Additional info:

https://github.com/openshift/console/pull/12822

Bug OCPBUGS-20784: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-disk-csi-driver-operator/pull/101

Bug OCPBUGS-8048: transition to multi-arch via cli have no guard for cluster conditions

View the Description View the linked PRs

Description of problem:

"oc adm upgrade --to-multi-arch" command have no guard in cases where there's cluster conditions that may interfere with the transition, such as:
Invalid=True, Failing=True, and Progressing=True

Steps to Reproduce:

either apply the command while an upgrade is in progress, or while there's cluster conditions such as Invalid=True or Failing=True

Actual results:

accepts the command

Expected results:

warns about the interfering condition, while allowing to progress only if --allow-upgrade-with-warnings is applied

https://github.com/openshift/oc/pull/1359

Bug OCPBUGS-5129: Unable to set capabilities with agent installer based installation

View the Description View the linked PRs

Description of problem:

I attempted to install a BM SNO with the agent based installer.
In the install_config, I disabled all supported capabilities except marketplace. Install_config snippet: 

capabilities:
  baselineCapabilitySet: None
  additionalEnabledCapabilities:
  - marketplace

The system installed fine but the capabilities config was not passed down to the cluster. 

clusterversion: 
status:
    availableUpdates: null
    capabilities:
      enabledCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - Storage
      - baremetal
      - marketplace
      - openshift-samples
      knownCapabilities:
      - CSISnapshot
      - Console
      - Insights
      - Storage
      - baremetal
      - marketplace
      - openshift-samples

oc -n kube-system get configmap cluster-config-v1 -o yaml
apiVersion: v1
data:
  install-config: |
    additionalTrustBundlePolicy: Proxyonly
    apiVersion: v1
    baseDomain: ptp.lab.eng.bos.redhat.com
    bootstrapInPlace:
      installationDisk: /dev/disk/by-id/wwn-0x62cea7f04d10350026c6f2ec315557a0
    compute:
    - architecture: amd64
      hyperthreading: Enabled
      name: worker
      platform: {}
      replicas: 0
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform: {}
      replicas: 1
    metadata:
      creationTimestamp: null
      name: cnfde8
    networking:
      clusterNetwork:
      - cidr: 10.128.0.0/14
        hostPrefix: 23
      machineNetwork:
      - cidr: 10.16.231.0/24
      networkType: OVNKubernetes
      serviceNetwork:
      - 172.30.0.0/16
    platform:
      none: {}
    publish: External
    pullSecret: ""

Version-Release number of selected component (if applicable):

4.12.0-rc.5

How reproducible:

100%

Steps to Reproduce:

1. Install SNO with agent based installer as described above
2.
3.

Actual results:

Capabilities installed

Expected results:

Capabilities not installed

Additional info:

https://github.com/openshift/installer/pull/6923

Bug OCPBUGS-22391: worker CSR are pending, so no worker nodes available

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20369~~. The following is the description of the original issue:
—
Description of problem:

worker CSR are pending, so no worker nodes available

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-06-234925

How reproducible:

Always

Steps to Reproduce:

Create a cluster with profile - aws-c2s-ipi-disconnected-private-fips

Actual results:

Workers csrs are pending

Expected results:

workers should be up and running all CSRs approved

Additional info:

failed to find machine for node ip-10-143-1-120” , in logs of cluster-machine-approver 

Seems like we should have ips like 
“ip-10-143-1-120.ec2.internal”

failing here - https://github.com/openshift/cluster-machine-approver/blob/master/pkg/controller/csr_check.go#L263

Must-gather - https://drive.google.com/file/d/15tz9TLdTXrH6bSBSfhlIJ1l_nzeFE1R3/view?usp=sharing

cluster - https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/238922/

template for installation - https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_14/ipi-on-aws/versioned-installer-customer_vpc-disconnected_private_cluster-fips-c2s-ci

cc Yunfei Jiang Zhaohua Sun

https://github.com/openshift/machine-config-operator/pull/4001

Bug OCPBUGS-13361: Update a plural string in dynamic demo plugin locales

View the Description View the linked PRs

Description of problem:

The dynamic demo plugin locales is missing a correct plural string. The dynamic demo plugin doesn't make use of the script console uses to transform plural strings, so we need to update the plural string manually

This would help with the further validation of i18n dependencies update changes, and also the investigation of [Dynamic plugin translation support for plurals broken](https://issues.redhat.com/browse/OCPBUGS-11285) bug

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Deploy dynamic demo plugin on a cluster
2. Goto Overview page
3.

Actual results:

The Node Worker string is NOT in correct plural format

Expected results:

The node Worker string is in the correct plural format

Additional info:

https://github.com/openshift/console/pull/12799

Bug OCPBUGS-18365: Default userManagedNetworking in ACI doesn't always work

View the Description View the linked PRs

When the user is providing ZTP manifests, a missing value for userManagedNetworking (in AgentClusterInstall) should be defaulted based on the platform type - for platform None this should default to true.

This is only happening if the platform type is misspelled as none instead of None. (Both are accepted for backwards compat with ~~OCPBUGS-7495~~, but they should not result in different behaviour.)

When the user starts from an install-config, we set the correct value explicitly in the generated AgentClusterInstall, so this is not a problem so long as the user doesn't edit it.

https://github.com/openshift/installer/pull/7458

Bug OCPBUGS-19794: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-10414: Incorrect domain resolution by the coredns/Corefile in Vsphere IPI Clusters | openshift-vsphere-infra

View the Description View the linked PRs

Description of problem:

Coredns template implementations using incorrect Regex for resolving dot [.] character

Version-Release number of selected component (if applicable):

NA

How reproducible:

100% when you use router sharding with domains including apps

Steps to Reproduce:

1. Create an additional IngressRouter with domains names including apps. for ex: example.test-apps.<clustername>.<clusterdomain>
2. Create and configure the external LB corresponding to the additonal IngressController 
3. Configure the corporate DNS server and create records for the this additional IngressController resolving to the LB Ip setup in step 2 above.  
4. Try resolving the additional domain routes from outside cluster and within cluster, the DNS resolution works fine fro outside cluster. However within cluster all additional domains consisting apps in the domain name resolve to the default ingress VIP instead of their corresponding LB IPs configured on the corportae DNS server.

As an alternate and simple test to reroduce you can reproduce it simply by using the dig command on the cluster node with the additinal domain

for ex: 
sh-4.4# dig test.apps-test..<clustername>.<clusterdomain>

Actual results:

DNS resolved all the domains consisting of apps to the defult Ingres VIP for example: example.test-apps.<clustername>.<clusterdomain> resolves to default ingressVIP instead of their actual coresponding LB IP.

Expected results:

DNS should resolve it to coresponding LB IP configured at the DNS server.

Additional info:

The DNS solution is happenng using the CoreFile Templates used on the node. which is treating dot(.) as character instead of actual dot[.] this is a Regex configuration bug inside CoreFile used on Vspehere IPI clusters.

https://github.com/openshift/machine-config-operator/pull/3626

Bug OCPBUGS-10638: Agent create sub-command is returning fatal error

View the Description View the linked PRs

Description of problem:

Agent create sub-command is showing fatal error when executing invalid command.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Execute `openshift-install agent create invalid`

Actual results:

FATA[0000] Error executing openshift-install: accepts 0 arg(s), received 1

Expected results:

It should return the help of the create command.

Additional info:

https://github.com/openshift/installer/pull/7005

Bug OCPBUGS-14149: Failed to list Kepler CSV

View the Description View the linked PRs

Description of problem:

Cannot list Kepler CSV

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Install Kepler Community Operator
2. Create Kepler Instance
3. Console gets error and shows "Oops, something went wrong"

Actual results:

Console gets error and shows "Oops, something went wrong"

Expected results:

Should list Kepler Instance

Additional info:

https://github.com/openshift/console/pull/12866

Bug OCPBUGS-15499: Console-operator is hotlooping

View the Description View the linked PRs

Description of problem:

Console-operator's config file gets updated every couple of seconds, where only the `resourceVersion` field get s changed.

Version-Release number of selected component (if applicable):

4.14-ec-2

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console-operator/pull/774

Bug OCPBUGS-16491: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/aws-ebs-csi-driver/pull/233

Bug OCPBUGS-22690: etcd restore should use the same snapshot for all replicas

View the Description View the linked PRs

All etcd replicas should be restored from the same snapshot.

Currently we expect a list of snapshot URLs for restore https://github.com/openshift/hypershift/blob/main/api/v1beta1/hostedcluster_types.go#L1703C49-L1703C50 which needs to be changed to accept only one URL.
https://github.com/openshift/hypershift/issues/2736
Documentation around etcd backup/restore should also reflect this correctly.

https://github.com/openshift/hypershift/pull/3146

Story API-1586: Rebase kubernetes and openshift-apiserver

View the Description View the linked PRs

The kubernetes-apiserver and openshift-apiserver need to be rebased to k8s 1.27.x after the o/k rebase is completed.

https://github.com/openshift/kubernetes-apiserver

https://github.com/openshift/openshift-apiserver

https://github.com/openshift/openshift-apiserver/pull/371

Bug OCPBUGS-11930: AWS VPC endpoint service not cleaned up when access to customer credentials lost

View the Description View the linked PRs

Description of problem:

VPC endpoint service cannot be cleaned up by HyperShift operator when the OIDC provider of the customer cluster has been deleted.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Sometimes

Steps to Reproduce:

1.Create a HyperShift hosted cluster
2.Delete the HyperShift cluster's OIDC provider in AWS
3.Delete the HyperShift hosted cluster

Actual results:

Cluster is stuck deleting

Expected results:

Cluster deletes

Additional info:

The hypershift operator is stuck trying to delete the AWS endpoint service but it can't be deleted because it gets an error that there are active connections.

https://github.com/openshift/hypershift/pull/2438

Bug OCPBUGS-17219: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/754

Task OU-218: Console monitoring UI should use `useResolvedExtensions` instead of `useExtensions`

View the Description View the linked PRs

`useExtensions` is not available in the dynamic plugin SDK, which prevents this functionality being copied to `monitoring-plugin`. `useResolvedExtensions` is available and provides the same functionality so we should use that instead.

https://github.com/openshift/console/pull/12987

Task MGMT-15107: 4.14 jobs reliying on LSO are failing

View the Description View the linked PRs

Description of the problem:

4.14 jobs relying on LSO are failing because we should use the version N-1 for LSO.
Something similar to https://github.com/openshift/assisted-service/pull/4753 should be merged.

Actual results:

Job fail with:

 ++ make deploy_assisted_operator test_kube_api_parallel
Error from server (NotFound): namespaces "assisted-spoke-cluster" not found
error: the server doesn't have a resource type "clusterimageset"
namespace "assisted-installer" deleted
error: the server doesn't have a resource type "agentserviceconfigs"
error: the server doesn't have a resource type "localvolume"
Error from server (NotFound): catalogsources.operators.coreos.com "assisted-service-catalog" not found

https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted-kube-api-late-binding-single-node

Expected results:

Job should be a success

https://github.com/openshift/assisted-service/pull/5323

Bug OCPBUGS-11531: Bump documentationBaseURL to 4.14

View the Description View the linked PRs

Description of problem:

documentationBaseURL is still linking to 4.13

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-05-183601

How reproducible:

Always

Steps to Reproduce:

1. get documentationBaseURL in cm/console-config
$ oc get cm console-config -n openshift-console -o yaml | grep documentationBaseURL
      documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-04-05-183601   True        False         68m     Cluster version is 4.14.0-0.nightly-2023-04-05-183601
2.
3.

Actual results:

documentationBaseURL: https://access.redhat.com/documentation/en-us/openshift_container_platform/4.13/

Expected results:

documentationBaseURL should be  https://access.redhat.com/documentation/en-us/openshift_container_platform/4.14/

Additional info:

https://github.com/openshift/console-operator/pull/750

Bug OCPBUGS-11944: oauth test failures related to CertificationVerificationError (go1.20)

View the Description View the linked PRs

Description of problem:

Most recent nightly https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.nightly/release/4.14.0-0.nightly-2023-04-18-152947 has a lot of OAuth test failures

Example runs:

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-e2e-aws-ovn/1648348911074545664

https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.14-e2e-metal-ipi-sdn-bm/1648348885556400128

Error looks like:

fail [github.com/openshift/origin/test/extended/oauth/expiration.go:105]: Unexpected error:
    <*tls.CertificateVerificationError | 0xc0023b6330>: {
        UnverifiedCertificates: [
            {...


Looking at changes in the last day or so, nothing sticks out to me.

Although I believed ART bumped everything to be built with go1.20 and this error is new to go1.20:

"For a handshake failure due to a certificate verification failure, the TLS client and server now return an error of the new type CertificateVerificationError, which includes the presented certificates." - https://go.dev/doc/go1.20

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-18-152947

How reproducible:

Looks repeatable

Steps to Reproduce:

1. Build oauth, origin, and related containers with go1.20 (not clear which is causing the test failure)
2.
3.

Actual results:

Tests fail

Expected results:

Additional info:

https://github.com/openshift/origin/pull/27883

Bug OCPBUGS-13081: Assisted Root device hints should accept by-path device alias

View the Description View the linked PRs

In many cases, the /dev/disk/by-path symlink is the only way to stably identify a disk without having prior knowledge of the hardware from some external source (e.g. a spreadsheet of disk serial numbers). It should be possible to specify this path in the root device hints.
Metal³ is planning to allow these paths in the `name` hint (see ~~OCPBUGS-13080~~), and assisted's implementation of root device hints (which is used in ZTP and the agent-based installer) should be changed to match.

https://github.com/openshift/assisted-service/pull/5185

Bug OCPBUGS-18336: HA konnectivity server causes connectivity issues from kas to worker kubelets

View the Description View the linked PRs

Description of problem:


On August 24th, a bugfix was merged into the hypershift repo to address OCPBUGS-16813 (https://github.com/openshift/hypershift/pull/2942). This resulted in a change in the konnectivity server with the HCP namespace. The change is that we went from a single konnectivity server to multiple when HA hcps are in use.

The konnectivity agents within the HCP worker nodes connect to the server through a route. When connecting through this route, the agents on the worker are supposed to discover all the HA konnectivity servers through round robin load balancing, meaning if the agents try to connect to the route endpoint enough times, the theory is that they should eventually discover all the servers.

With the kubevirt platform, only a single konnectivity server is discovered by the agents in the worker nodes, which leads to the inability for the kas on the HCP to reliably contact kubelets within the worker nodes.

The outcome of this issue is that webhooks (and other connections that require the kas (api server) in the HCP to contact worker nodes) to fail the majority of the time.

Version-Release number of selected component (if applicable):

How reproducible:


create a kubevirt platform HCP using the `hcp` cli tool. This will default to HA mode, and the cluster will never fully roll out. The ingress, monitoring, and console clusteroperators will flap back and forth between failing and success. Usually we'll see an error about webhook connectivity failing.

During this time, any `oc` command that attempts to tunnel a connection through the kas to the kubelets will fail the majority of the time. This means `oc logs`, `oc exec`, etc... will not work. 


Actual results:{code:none}

kas -> kubelet connections are unreliable

Expected results:


kas -> kubelet connections are reliable

Additional info:

https://github.com/openshift/hypershift/pull/2971

Bug OCPBUGS-10690: startupProbe for UWM prometheus is still 15m

View the Description View the linked PRs

Description of problem:

according to PR: https://github.com/openshift/cluster-monitoring-operator/pull/1824, startupProbe for UWM prometheus/platform prometheus should be 1 hour, but startupProbe for UWM prometheus is still 15m after enabled UWM, platform promethues does not have issue, startupProbe is increased to 1 hour

$ oc -n openshift-user-workload-monitoring get pod prometheus-user-workload-0 -oyaml | grep startupProbe -A20
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready;
          elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready;
          else exit 1; fi
      failureThreshold: 60
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 3
...

$ oc -n openshift-monitoring get pod prometheus-k8s-0 -oyaml | grep startupProbe -A20
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready;
          elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready;
          else exit 1; fi
      failureThreshold: 240
      periodSeconds: 15
      successThreshold: 1
      timeoutSeconds: 3

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-03-19-052243

How reproducible:

always

Steps to Reproduce:

1. enable UWM, check startupProbe for UWM prometheus/platform prometheus
2.
3.

Actual results:

startupProbe for UWM prometheus is still 15m

Expected results:

startupProbe for UWM prometheus should be 1 hour

Additional info:

since startupProbe for platform prometheus is increased to 1 hour, and no similar bug for UWM prometheus, won't fix the issue is OK.

https://github.com/openshift/cluster-monitoring-operator/pull/1930

Bug OCPBUGS-14618: app.kubernetes.io/version annotations aren't up-to-date for some monitoring components

View the Description View the linked PRs

Description of problem:

The prometheus-operator pod has the "app.kubernetes.io/version: 0.63.0" annotation while it's based on 0.65.1.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Check app.kubernetes.io/version annotations for prometheus-operator pod.
2.
3.

Actual results:

0.63.0

Expected results:

0.65.1

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1988

Bug OCPBUGS-15992: OCP 4.14.0-ec.3 machine-api-controller pod crashing

View the Description View the linked PRs

Description of problem:


OCP deployments are failing with machine-api-controller pod crashing.

Version-Release number of selected component (if applicable):

OCP 4.14.0-ec.3

How reproducible:

Always

Steps to Reproduce:

1. Deploy a Baremetal cluster
2. After bootstrap is completed, check the pods running in the openshift-machine-api namespace
3. Check machine-api-controllers-* pod status (it goes from Running to Crashing all the time)
4. Deployment eventually times out and stops with only the master nodes getting deployed.

Actual results:

machine-api-controllers-* pod remains in a crashing loop and OCP 4.14.0-ec.3 deployments fail.

Expected results:

machine-api-controllers-* pod remains running and OCP 4.14.0-ec.3 deployments are completed

Additional info:

Jobs with older nightly releases in 4.14 are passing, but since Saturday Jul 10th, our CI jobs are failing

$ oc version
Client Version: 4.14.0-ec.3
Kustomize Version: v5.0.1
Kubernetes Version: v1.27.3+e8b13aa

$ oc get nodes
NAME       STATUS   ROLES                  AGE   VERSION
master-0   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-1   Ready    control-plane,master   37m   v1.27.3+e8b13aa
master-2   Ready    control-plane,master   38m   v1.27.3+e8b13aa

$ oc -n openshift-machine-api get pods -o wide
NAME                                                  READY   STATUS             RESTARTS        AGE   IP              NODE       NOMINATED NODE   READINESS GATES
cluster-autoscaler-operator-75b96869d8-gzthq          2/2     Running            0               48m   10.129.0.6      master-0   <none>           <none>
cluster-baremetal-operator-7c9cb8cd69-6bqcg           2/2     Running            0               48m   10.129.0.7      master-0   <none>           <none>
control-plane-machine-set-operator-6b65b5b865-w996m   1/1     Running            0               48m   10.129.0.22     master-0   <none>           <none>
machine-api-controllers-59694ff965-v4kxb              6/7     CrashLoopBackOff   7 (2m31s ago)   46m   10.130.0.12     master-2   <none>           <none>
machine-api-operator-58b54d7c86-cnx4w                 2/2     Running            0               48m   10.129.0.8      master-0   <none>           <none>
metal3-6ffbb8dcd4-drlq5                               6/6     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-baremetal-operator-bd95b6695-q6k7c             1/1     Running            0               45m   10.130.0.16     master-2   <none>           <none>
metal3-image-cache-4p7ln                              1/1     Running            0               45m   192.168.62.22   master-1   <none>           <none>
metal3-image-cache-lfmb4                              1/1     Running            0               45m   192.168.62.23   master-2   <none>           <none>
metal3-image-cache-txjg5                              1/1     Running            0               45m   192.168.62.21   master-0   <none>           <none>
metal3-image-customization-65cf987f5c-wgqs7           1/1     Running            0               45m   10.128.0.17     master-1   <none>           <none>

$ oc -n openshift-machine-api logs machine-api-controllers-59694ff965-v4kxb -c machine-controller | less
...
E0710 15:55:08.230413       1 logr.go:270] controller-runtime/source "msg"="if kind is a CRD, it should be installed before calling Start" "error"="no matches for kind \"Metal3Remediation\" in version \"infrastructure.cluster.x-k8s.io/v1beta1\""  "kind"={"Group":"infrastructure.cluster.x-k8s.io","Kind":"Metal3Remediation"}
E0710 15:55:14.019930       1 controller.go:210]  "msg"="Could not wait for Cache to sync" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced" "controller"="metal3remediation" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="Metal3Remediation" 
I0710 15:55:14.020025       1 logr.go:252]  "msg"="Stopping and waiting for non leader election runnables"  
I0710 15:55:14.020054       1 logr.go:252]  "msg"="Stopping and waiting for leader election runnables"  
I0710 15:55:14.020095       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-drain-controller" 
I0710 15:55:14.020147       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machineset-controller" 
I0710 15:55:14.020169       1 controller.go:247]  "msg"="Shutdown signal received, waiting for all workers to finish" "controller"="machine-controller" 
I0710 15:55:14.020184       1 controller.go:249]  "msg"="All workers finished" "controller"="machineset-controller" 
I0710 15:55:14.020181       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-drain-controller" 
I0710 15:55:14.020190       1 controller.go:249]  "msg"="All workers finished" "controller"="machine-controller" 
I0710 15:55:14.020209       1 logr.go:252]  "msg"="Stopping and waiting for caches"  
I0710 15:55:14.020323       1 logr.go:252]  "msg"="Stopping and waiting for webhooks"  
I0710 15:55:14.020327       1 reflector.go:225] Stopping reflector *v1alpha1.BareMetalHost (10h53m58.149951981s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020393       1 reflector.go:225] Stopping reflector *v1beta1.Machine (9h40m22.116205595s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020399       1 logr.go:252] controller-runtime/webhook "msg"="shutting down webhook server"  
I0710 15:55:14.020437       1 reflector.go:225] Stopping reflector *v1.Node (10h3m14.461941979s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
I0710 15:55:14.020466       1 logr.go:252]  "msg"="Wait completed, proceeding to shutdown the manager"  
I0710 15:55:14.020485       1 reflector.go:225] Stopping reflector *v1beta1.MachineSet (10h7m28.391827596s) from sigs.k8s.io/controller-runtime/pkg/cache/internal/informers_map.go:262
E0710 15:55:14.020500       1 main.go:218] baremetal-controller-manager/entrypoint "msg"="unable to run manager" "error"="failed to wait for metal3remediation caches to sync: timed out waiting for cache to be synced"  
E0710 15:55:14.020504       1 logr.go:270]  "msg"="error received after stop sequence was engaged" "error"="leader election lost"

Our CI job logs can be seen here (RedHat SSO): https://www.distributed-ci.io/jobs/7da8ee48-8918-4a97-8e3c-f525d19583b8/files

https://github.com/openshift/cluster-api-provider-baremetal/pull/193

Bug OCPBUGS-25799: Monitoring-plugin can not start on IPv6 disabled cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21610~~. The following is the description of the original issue:
—
Description of problem:

monitoring-plugin can not be started on IPv6 disabled cluster as the pod listen on [::]:9443. 

Monitoring-plugin should listen on [::]:9443 on IPv6 enabled cluster
Monitoring-plugin should listen on 0.0.0.0:9443 on IPv6 disabled cluster.


$oc logs monitoring-plugin-dc84478c-5rwmm2023/10/14 13:42:41 [emerg] 1#0: socket() [::]:9443 failed (97: Address family not supported by protocol)nginx: [emerg] socket() [::]:9443 failed (97: Address family not supported

Version-Release number of selected component (if applicable):

4.14.0-rc.5

How reproducible:

Always

Steps to Reproduce:

1) disable ipv6 following https://access.redhat.com/solutions/5513111

cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: master
  name: 99-openshift-machineconfig-master-kargs
spec:
  kernelArguments:
  - ipv6.disable=1
EOF
 
cat <<EOF |oc create -f -
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
machineconfiguration.openshift.io/role: worker
  name: 99-openshift-machineconfig-worker-kargs
spec:
  kernelArguments:
   -  ipv6.disable=1
EOF

2) Check the mcp status

3) Check the monitoring plugin pod status

Actual results:
1) mcp is pending as monitor-plugin pod can not be schedule

$ oc get mcp |grep worker.
worker   rendered-worker-ba1d1b8306f65bc5ff53b0c05a54143f   False     True       False      5              3                   3                     0                      3h59m

$oc logs machine-config-controller-5b96788c69-j9d7k
I1014 13:05:57.767217       1 drain_controller.go:350] Previous node drain found. Drain has been going on for 0.025260005567777778 hours
I1014 13:05:57.767228       1 drain_controller.go:173] node anlim14-c6jbb-worker-b-rgqq5.c.openshift-qe.internal: initiating drain
E1014 13:05:58.411241       1 drain_controller.go:144] WARNING: ignoring DaemonSet-managed ……
I1014 13:05:58.413116       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:05:58.422164       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I1014 13:06:03.422338       1 drain_controller.go:144] evicting pod openshift-monitoring/monitoring-plugin-dc84478c-92xr4
E1014 13:06:03.433295       1 drain_controller.go:144] error when evicting pods/"monitoring-plugin-dc84478c-92xr4" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

2) monitoring-plugin pod listen on [::] which is an invalid address on IPv6 disabled cluster.

$oc extract cm/monitoring-plugin
$cat nginx.conf 
error_log /dev/stdout info;
events {}
http {
  include            /etc/nginx/mime.types;
  default_type       application/octet-stream;
  keepalive_timeout  65;
  server {
    listen              9443 ssl;
    listen              [::]:9443 ssl;
    ssl_certificate     /var/cert/tls.crt;
    ssl_certificate_key /var/cert/tls.key;
    root                /usr/share/nginx/html;
  }
}

Expected results:

Monitoring-plugin listens on [::]:9443 on IPv6 enabled cluster

Monitoring-plugin listens on 0.0.0.0:9443 on IPv6 disabled cluster.

Additional info:

The PR about how logging fix this issue. https://github.com/openshift/cluster-logging-operator/pull/2207/files#diff-dc6205a02c6c783e022ae0d4c726327bee4ef34cd1361541d1e3165ee7056b38R43

https://github.com/openshift/cluster-monitoring-operator/pull/2215

Bug MGMT-14306: [Staging] [UI] - Day2 - In add hosts - getting "Failed to set role" when assigning "auto-assign" role to a host

View the Description View the linked PRs

Description of the problem:

In staging, UI 2.18.2, BE 2.18.0 - Day2 add hosts - getting the following error when assigning auto-assign role:

Failed to set roleRequested
 role (auto-assign) is invalid for host 
c746e34f-f44a-4291-9064-402ab95b5831 from infraEnv 
2b4ee2bf-ee45-4f57-b64e-715bc955f92e

How reproducible:

100%

Steps to reproduce:

1. install day1 cluster

2. In OCM, go to add host and discover new host

3. Assign auto-select role to this host

Actual results:

Expected results:

https://github.com/openshift/assisted-service/pull/5247

Bug OCPBUGS-15168: Oauth Server invalidly proxies cloud IAM traffic

View the Description View the linked PRs

Description of problem:

In the Konnectivity SOCKS proxy: currently the default is to proxy cloud endpoint traffic: https://github.com/openshift/hypershift/blob/main/konnectivity-socks5-proxy/main.go#L61

Due to this after this change: https://github.com/openshift/hypershift/commit/0c52476957f5658cfd156656938ae1d08784b202

The oauth server had a behavior change where it began to proxy iam traffic instead of not proxying it. This causes a regression in Satellite environments running with an HTTP_PROXY server. The original network traffic path needs to be restored

Version-Release number of selected component (if applicable):

4.13 4.12

How reproducible:

100%

Steps to Reproduce:

1. Setup HTTP_PROXY IBM Cloud Satellite environment
2. In the oauth-server pod run a curl against iam (curl -v https://iam.cloud.ibm.com)
3. It will log it is using proxy

Actual results:

It is using proxy

Expected results:

It should send traffic directly (as it does in 4.11 and 4.10)

Additional info:

https://github.com/openshift/hypershift/pull/2699

Bug OCPBUGS-21388: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-credential-operator/pull/622

Bug OCPBUGS-5529: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-node-tuning-operator/pull/564

Bug OCPBUGS-9357: On an SNO node one of the CatalogSources gets deleted after multiple reboots

View the Description View the linked PRs

Description of problem:

On an SNO node one of the CatalogSources gets deleted after multiple reboots.

In the initial stage we have 2 catalogsources:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 20h
openshift-marketplace redhat-operators Red Hat Operators Catalog grpc Red Hat 18h

After running several node reboots, one of the catalogsouce doesn't show up anylonger:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 21h

Version-Release number of selected component (if applicable):
4.11.0-fc.3

How reproducible:
Inconsistent but reproducible

Steps to Reproduce:

1. Deploy and configure SNO node via ZTP process. Configuration sets up 2 CatalogSources in a restricted environment for redhat-operators and certified-operators

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: certified-operators
namespace: openshift-marketplace
spec:
displayName: Intel SRIOV-FEC Operator
image: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/olm/far-edge-sriov-fec:v4.11
publisher: Red Hat
sourceType: grpc
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: redhat-operators
namespace: openshift-marketplace
spec:
displayName: Red Hat Operators Catalog
image: registry.kni-qe-0.lab.eng.rdu2.redhat.com:5000/olm/redhat-operators:v4.11
publisher: Red Hat
sourceType: grpc

2. Reboot the node via `sudo reboot` several times

3. Check catalogsources

Actual results:

$ oc get catsrc -A
NAMESPACE NAME DISPLAY TYPE PUBLISHER AGE
openshift-marketplace certified-operators Intel SRIOV-FEC Operator grpc Red Hat 22h

Expected results:

All catalogsources created initially are still present.

Additional info:

Attaching must-gather.

https://github.com/operator-framework/operator-marketplace/pull/520

Bug OCPBUGS-11670: mcc_drain_err metric should not be served for removed nodes

View the Description View the linked PRs

Description of problem:

Seen in 4.13.0-rc.2, mcc_drain_err is being served for nodes that have been deleted, causing un-actionable MCDDrainError.

Version-Release number of selected component (if applicable):

At least 4.13.0-rc.2. Further exposure unclear.

How reproducible:

At least four nodes on build01. Possibly all nodes that are removed while suffering from drain issues on 4.13.0-rc.2.

Steps to Reproduce:

Unclear.

Actual results:

The machine-config controller continues to serve mcc_drain_err for the removed nodes.

Expected results:

The machine-config controller never serves{{mcc_drain_err}} for non-existant nodes.

https://github.com/openshift/machine-config-operator/pull/3689

Bug OCPBUGS-13564: OCP installer's OpenStack Ironic iRMC driver doesn'e work with FIPS mode enabled.

View the Description View the linked PRs

Description of problem:

OCP installer's OpenStack Ironic iRMC driver doesn'e work with FIPS mode enabled, as it requires SNMP version to be set to v3. However, there is no way to set the SNMP version parameter in the RHOCP installer yaml file, so it falls back to default v2, and it fails 100% of the time.

Version-Release number of selected component (if applicable):

Release Number: 14.0-ec.0

Drivers or hardware or architecture dependency:
Deploy baremetal node with BMC using iRMC protocol(When RHOCP installer uses OpenStack Ironic iRMC driver)

Hardware configuration:
Model/Hypervisor: PRIMERGY RX2540 M6
CPU Info: Intel(R) Xeon(R) Gold 5317 CPU @ 3.00GHz
Memory Info: 125G
Hardware Component Information: None
Configuration Info: None
Guest Configuration Info: None

How reproducible:

Always

Steps to Reproduce:

  1. Enable FIPS mode of RHOCP nodes through setting "fips" to "true" at install-config.yaml.
  2. In install-config.yaml, set platform.baremetal.hosts.bmc.address to start with 'irmc://'
  3. Run OpenShift Container Platform installer.

Actual results:

OpenStack Ironic iRMC driver used in OpenShift Container Platform installer doesn't work and installation fails. Log message suggests setting SNMP version parameter of Ironic iRMC driver to v3 (non-default value) under FIPS mode enabled.

Expected results:

When FIPS mode is enabled on RHOCP, OpenStack Ironic iRMC driver used in RHOCP installer checks whether iRMC driver is configured to use SNMP (current OCP installer configures iRMC driver not to use SNMP) and if iRMC driver is configured not to use SNMP, driver doesn't require setting SNMP version parameter to v3 and installation proceeds. If iRMC driver is configured to use SNMP, driver requires setting SNMP version parameter to v3.

Additional info:

When FIPS mode is enabled, installation of RHOCP into Fujitsu server fails
because OpenStack Ironic iRMC driver, which is used in RHOCP installer,
requires iRMC driver's SNMP version parameter to be set to v3 even though
iRMC driver isn't configured to use SNMP and there is no way to set it to v3.

Installing RHOCP with IPI to baremetal node uses install-config.yaml.
User sets configuration related to RHOCP in install-config.yaml.
This installation uses OpenStack Ironic internally and values in
install-config.yaml affect behavior of Ironic.
During installation, Ironic connects to BMC(Baseboard management controller)
and does operation related to RHOCP installation (e.g. power management).

Ironic uses iRMC driver to operate on Fujitsu server's BMC. And iRMC driver checks
iRMC-driver-specific Ironic parameters stored at Ironic component.
When FIPS is enabled (i.e. "fips" is set to "true" in install-config.yaml), iRMC
driver checks whether SNMP version specified in Ironic parameter to be set to v3
even though iRMC driver isn't configured to use SNMP internally.
Currently, default value of SNMP version parameter of Ironic, which is iRMC driver
specific parameter, is v2c and not v3. And iRMC driver fails with error if SNMP
version is set to other than v3 when FIPS enabled.

However, there is no way to set SNMP version parameter in RHOCP and that
parameter is set to v2c by default. So when FIPS is enabled, deployment of
OpenShift to Fujitsu server always fails.

Cause of problem is, when FIPS is enabled, iRMC driver always requires SNMP
version parameter to be set to v3 even though iRMC driver is not configured
to use SNMP (current RHOCP installer configures iRMC driver not to use SNMP).
To solve this problem, iRMC driver should be modified to check whether iRMC driver
is configured to use SNMP internally and, if iRMC driver is configured to use SNMP
and FIPS is enabled, requires SNMP version parameter to be set to v3.
Such modification patch is already submitted to OpenStack Ironic community[1].

Summary of actions taken to resolve issue:
Use OpenStack Ironic iRMC driver which incorporates bug fix patch[1] submitted on OpenStack Ironic community.

[1] https://review.opendev.org/c/openstack/ironic/+/881358

https://github.com/openshift/ironic-image/pull/379

Bug OCPBUGS-16072: Updating Kubernetes and associated dependencies

View the Description View the linked PRs

Description of problem:

Kubernetes and other associated dependencies need to be updated to protect against potential vulnerabilities.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cluster-openshift-controller-manager-operator/pull/296

Bug OCPBUGS-21149: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-ibm/pull/55

Bug OCPBUGS-12545: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/alibaba-disk-csi-driver-operator/pull/50

Bug OCPBUGS-12748: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7118

Bug OCPBUGS-21702: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-baremetal/pull/198

Bug OCPBUGS-22478: Extra space is in the translation text(Chinese) of 'Create rolebinding' and 'replicate rolebinding'

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20305~~. The following is the description of the original issue:
—
Description of problem:

Extra space is in the translation text(Chinese) of Duplicate RoleBinding' in kebab list

The change of PR https://github.com/openshift/console/pull/12099 for some reason are not included into the master/release4.12-4.14 branch

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Login OCP, update language to Chinese
2. Navigate to RoleBindings page, choose one rolebinding, click the kebab icon on the end, check the translation text of 'Duplicate RoleBinding'
3.

Actual results:

2. It's shown '重复 角色绑定' and "重复 集群角色绑定"

Expected results:

Remove extra space
It's shown '重复角色绑定' and "重复集群角色绑定"

Additional info:

https://github.com/openshift/console/pull/13290

Bug OCPBUGS-23169: Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20481~~. The following is the description of the original issue:
—
Multi-vcenter and wrong user/password in secret/vmware-vsphere-cloud-credentials causes the vSphere CSI Driver controller pods restarting

Description of problem:
When there are Multi-vcenter in secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers (see bug https://issues.redhat.com/browse/OCPBUGS-20478), the vSphere CSI Driver controller pods restarting are always restarting.

vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-br4gs   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             1s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    Pending             0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             0s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-b946b657-7t74p     13/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-545dc5679f-mdsjt   0/13    Terminating         0             3s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   12/13   Terminating         0             9s
vmware-vsphere-csi-driver-controller-587f78b9c7-9pfmp   0/13    ContainerCreating   0             2s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-587f78b9c7-qdb89   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             10s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s
vmware-vsphere-csi-driver-controller-545dc5679f-75wfm   0/13    Terminating         0             11s

$ oc get co storage
storage                                    4.14.0-0.nightly-2023-10-10-084534   False       True          False      15s     VSphereCSIDriverOperatorCRAvailable: VMwareVSphereDriverControllerServiceControllerAvailable: Waiting for Deployment

$ oc logs -f deployment.apps/vmware-vsphere-csi-driver-controller --tail=500
{"level":"error","time":"2023-10-12T11:40:38.920487342Z","caller":"service/driver.go:189","msg":"failed to init controller. Error: ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).BeforeServe\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:189\nsigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:202\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}
{"level":"info","time":"2023-10-12T11:40:38.920536779Z","caller":"service/driver.go:109","msg":"Configured: \"csi.vsphere.vmware.com\" with clusterFlavor: \"VANILLA\" and mode: \"controller\"","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","TraceId":"ec636d3d-1ddb-43a5-b9f7-8541dacff583"}
{"level":"error","time":"2023-10-12T11:40:38.920572294Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +ServerFaultCode: Cannot complete login due to an incorrect user name or password.","TraceId":"5e60e6c5-efeb-4080-888c-74182e4fb1f4","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/pkg/csi/service/driver.go:203\nmain.main\n\t/go/src/github.com/kubernetes-sigs/vsphere-csi-driver/cmd/vsphere-csi/main.go:71\nruntime.main\n\t/usr/lib/golang/src/runtime/proc.go:250"}

$ oc logs vmware-vsphere-csi-driver-operator-b4b8d5d56-f76pc
I1012 11:43:08.973130       1 event.go:298] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-cluster-csi-drivers", Name:"vmware-vsphere-csi-driver-operator", UID:"a8492b8c-8c13-4b15-aedc-6f3ced80618e", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'DeploymentUpdateFailed' Failed to update Deployment.apps/vmware-vsphere-csi-driver-controller -n openshift-cluster-csi-drivers: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
E1012 11:43:08.996554       1 base_controller.go:268] VMwareVSphereDriverControllerServiceController reconciliation failed: Operation cannot be fulfilled on deployments.apps "vmware-vsphere-csi-driver-controller": the object has been modified; please apply your changes to the latest version and try again
W1012 11:43:08.999148       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver
W1012 11:43:09.390489       1 driver_starter.go:206] CSI driver can only connect to one vcenter, more than 1 set of credentials found for CSI driver

Version-Release number of selected component (if applicable):
4.14.0-0.nightly-2023-10-10-084534

How reproducible:
Always

Steps to Reproduce:
See Description

Actual results:
Storage CSI Driver pods are restarting

Expected results:
Storage CSI Driver pods should not restarting

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/193

Bug OCPBUGS-21785: [Azure] EgressIP cannot be applied to the egress node on Azure private cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-5491~~. The following is the description of the original issue:
—
Description of problem:

The issue was found in ci, and it is an Azure private cluster, all the egressIP cases failed due to  EgressIP cannot be applied to the egress node. It was able to be reproduced manually.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2023-01-08-142418

How reproducible:

Always

Steps to Reproduce:

1. Label one worker node as egress node
2. Create one egressIP object
3.

Actual results:

% oc get egressip
NAME             EGRESSIPS    ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-2       10.0.1.10                    
egressip-47164   10.0.1.217 

% oc get cloudprivateipconfig 
NAME         AGE
10.0.1.10    18m
10.0.1.217   22m
% oc get cloudprivateipconfig  -o yaml
apiVersion: v1
items:
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-2
    creationTimestamp: "2023-01-09T10:11:33Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.10
    resourceVersion: "59723"
    uid: d697568a-7d7c-471a-b5e1-d7b814244549
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:17:06Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.10"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
- apiVersion: cloud.network.openshift.io/v1
  kind: CloudPrivateIPConfig
  metadata:
    annotations:
      k8s.ovn.org/egressip-owner-ref: egressip-47164
    creationTimestamp: "2023-01-09T10:07:56Z"
    finalizers:
    - cloudprivateipconfig.cloud.network.openshift.io/finalizer
    generation: 1
    name: 10.0.1.217
    resourceVersion: "58333"
    uid: 6a7d6196-cfc9-4859-9150-7371f5818b74
  spec:
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
  status:
    conditions:
    - lastTransitionTime: "2023-01-09T10:13:29Z"
      message: 'Error processing cloud assignment request, err: network.InterfacesClient#CreateOrUpdate:
        Failure sending request: StatusCode=0 -- Original Error: Code="OutboundRuleCannotBeUsedWithBackendAddressPoolThatIsReferencedBySecondaryIpConfigs"
        Message="OutboundRule /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/outboundRules/outbound-rule-v4
        cannot be used with Backend Address Pool /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/loadBalancers/huirwang-0109b-bv4ld/backendAddressPools/huirwang-0109b-bv4ld
        that contains Secondary IPConfig /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/huirwang-0109b-bv4ld-rg/providers/Microsoft.Network/networkInterfaces/huirwang-0109b-bv4ld-worker-eastus1-llmpb-nic/ipConfigurations/huirwang-0109b-bv4ld-worker-eastus1-llmpb_10.0.1.217"
        Details=[]'
      observedGeneration: 1
      reason: CloudResponseError
      status: "False"
      type: Assigned
    node: huirwang-0109b-bv4ld-worker-eastus1-llmpb
kind: List
metadata:
  resourceVersion: ""

Expected results:

EgressIP can be applied correctly

Additional info:

https://github.com/openshift/cloud-network-config-controller/pull/125

Task HOSTEDCP-1156: Add Note to Upstream Docs about Defaulting Webhook and Release Image Flag

View the Description View the linked PRs

https://github.com/openshift/hypershift/pull/2892#issuecomment-1673193901

https://github.com/openshift/hypershift/pull/2922

Bug OCPBUGS-14811: Update OWNERS and OWNERS_ALIASES in external-provisioner repo

View the Description View the linked PRs

Sanitize OWNERS/OWNER_ALIASES:

1) OWNERS must have:

component: "Storage / Kubernetes External Components"

2) OWNER_ALIASES must have all team members of Storage team.

https://github.com/openshift/csi-external-provisioner/pull/66

Bug OCPBUGS-17054: Nutanix: CCM should scope secret informers per namespace

View the Description View the linked PRs

Description of problem:

The CCMs at the moment are given RBAC permissions of "get, list, watch" on secrets across all namespaces. This was a security concern raised by the OpenShift Security team. 

In Nutanix CCM, it currently creates a secrets informer and a configmaps informer at the cluster scope, these are then passed into the NewProvider call for the prism environment. Within the prism environment, the configmap and secret informers are used once each, and only to list a single namespace. We should modify the informers creation to limit to just the namespaces required? This would reduce the scope of RBAC required and meet the OpenShift security requirements.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

https://github.com/openshift/cloud-provider-nutanix/pull/17

Bug OCPBUGS-19678: Validate Cluster Name in HostedCluster Controller

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17669~~. The following is the description of the original issue:
—
Description of problem:

The HostedCluster name is not currently validated against RFC1123.

Version-Release number of selected component (if applicable):

How reproducible:

Every time

Steps to Reproduce:

1.
2.
3.

Actual results:

Any HostedCluster name is allowed

Expected results:

Only HostedCluster names meeting RFC1123 validation should be allowed.

Additional info:

https://github.com/openshift/hypershift/pull/3040

Bug OCPBUGS-17782: [OVN-IC] Regression in Ovnkube-node container memory usage (380% increase)

View the Description View the linked PRs

Ovnkube-node container max memory usage was 110 MiB with 4.14.0-0.nightly-2023-05-18-231932 image and now it is 530 MiB with 4.14.0-0.nightly-2023-07-31-181848 image, for the same test (cluster-density-v2 with 800 iterations, churn=false) on 120 node environment. We observed the same pattern in the OVN-IC environment as well.

Note: As churn is false, we are calculating memory usage for only resource creation.

Grafana panel for OVN with 4.14.0-0.nightly-2023-05-18-231932 image -

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/H9pAb07fsPEOFyd5dhKLFP602A7S18uC

Grafana panel for OVN with 4.14.0-0.nightly-2023-07-31-181848 image -

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/8158bJgv3e4P2uiVernbc2E5ypBWFYHt

As the test was successfully run in the CI, we couldn't collect a must-gather. I can provide must-gather and pprof data if needed.

We observed 100 MiB to 550 MiB increase in OVN-IC between 4.14.0-0.nightly-2023-06-12-141936 and 4.14.0-0.nightly-2023-07-30-191504 versions.

OVN-IC 4.14.0-0.nightly-2023-06-12-141936

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/o5SXLdHIL8whsdgaMyXwWamipBP8J2fF

OVN-IC 4.14.0-0.nightly-2023-07-30-191504

https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/NMuSQx7YAJ9jokoKMl6Me9StHp33tjwD

https://github.com/openshift/cluster-network-operator/pull/1971

Bug OCPBUGS-10177: Update 4.14 kube-state-metrics image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/kube-state-metrics/pull/91

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/kube-state-metrics/pull/91

Bug OCPBUGS-15544: [Reliability] regression: continuously memory increase on a ovnkube-node pod

View the Description View the linked PRs

Description of problem:

In Reliability (loaded longrun) test, the memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G. I did not see this issue in previous releases.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-27-000502

How reproducible:

I met this issue the first time

Steps to Reproduce:

1. Install a AWS OVN cluster with 3 masters, 3 workers, vm_type are all m5.xlarge.
2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2 with config: 1 admin, 15 dev-test, 1 dev-prod. The test will long run the configured tasks.
3. Monitor the test failures in and performance dashboard.

Test failures slack notification: https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1687944463913769

Performance dashboard:http://dittybopper-dittybopper.apps.qili-414-haproxy.qe-lrc.devcluster.openshift.com/d/IgK5MW94z/openshift-performance?orgId=1&from=1687944452000&to=now&refresh=1h

Actual results:

The memory of ovnkube-node-xxx pods on all 6 nodes keep increasing.
Within 24 hours, increased to about 1.6G.

Expected results:

The memory of ovnkube-node-xxx pods

Additional info:

% oc adm top pod -n openshift-ovn-kubernetes | grep node
ovnkube-node-4t282     146m         1862Mi          
ovnkube-node-9p462     41m          1847Mi          
ovnkube-node-b6rqj     46m          2032Mi          
ovnkube-node-fp2gn     72m          2107Mi          
ovnkube-node-hxf95     11m          2359Mi          
ovnkube-node-ql9fx     38m          2089Mi

I did a pprof heap on one of the pod and upload to heap-ovnkube-node-4t282.out
Must-gather is uploaded to must-gather.local.1315176578017655774.tar.gz
performance dashboard screenshot for ovnkube-node-memory.png

Bug OCPBUGS-17359: CI fails because it pulls "openshift/origin-node" from Docker Hub and gets rate-limited

View the Description View the linked PRs

Description of problem

CI is flaky because tests pull the "openshift/origin-node" image from Docker Hub and get rate-limited:

E0803 20:44:32.429877    2066 kuberuntime_image.go:53] "Failed to pull image" err="rpc error: code = Unknown desc = reading manifest latest in docker.io/openshift/origin-node: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit" image="openshift/origin-node:latest"

This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/929/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/16871891662673059841687189166267305984. I don't know how to search for this failure using search.ci. I discovered the rate-limiting through Loki: https://grafana-loki.ci.openshift.org/explore?orgId=1&left=%7B%22datasource%22:%22PCEB727DF2F34084E%22,%22queries%22:%5B%7B%22expr%22:%22%7Binvoker%3D%5C%22openshift-internal-ci%2Fpull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator%2F1687189166267305984%5C%22%7D%20%7C%20unpack%20%7C~%20%5C%22pull%20rate%20limit%5C%22%22,%22refId%22:%22A%22,%22editorMode%22:%22code%22,%22queryType%22:%22range%22%7D%5D,%22range%22:%7B%22from%22:%221691086303449%22,%22to%22:%221691122303451%22%7D%7D.

Version-Release number of selected component (if applicable)

This happened on 4.14 CI job.

How reproducible

I have observed this once so far, but it is quite obscure.

Steps to Reproduce

1. Post a PR and have bad luck.
2. Check Loki using the following query:

{...} {invoker="openshift-internal-ci/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/*"} | unpack | systemd_unit="kubelet.service" |~ "pull rate limit"

Actual results

CI pulls from Docker Hub and fails.

Expected results

CI passes, or fails on some other test failure. CI should never pull from Docker Hub.

Additional info

We have been using the "openshift/origin-node" image in multiple tests for years. I have no idea why it is suddenly pulling from Docker Hub, or how we failed to notice that it was pulling from Docker Hub if that's what it was doing all along.

https://github.com/openshift/cluster-ingress-operator/pull/970

Bug OCPBUGS-20269: unit test job failure rates are high in oc

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20181~~. The following is the description of the original issue:
—
Description of problem:

unit test failures rates are high https://prow.ci.openshift.org/job-history/gs/origin-ci-test/pr-logs/directory/pull-ci-openshift-oc-master-unit

TestNewAppRunAll/emptyDir_volumes is failing

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_oc/1557/pull-ci-openshift-oc-master-unit/1710206848667226112

Version-Release number of selected component (if applicable):

How reproducible:

Run local or in CI and see that unit test job is failing

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/oc/pull/1562

Bug OCPBUGS-8299: CronJobs table/details UI doesn't have Suspend indication

View the Description View the linked PRs

Description of problem:

Dev sandbox - CronJobs table/details UI doesn't have Suspend indication

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create sample CronJob with either @daily or @hourly as schedule
2. Navigate to Administrator/Workloads/CronJobs area
3. Observe that table with CronJobs contain your created entry, but no column with Suspend True/False indication
4. Navigate into that same cron job details - still no presence of Suspend state
5. Then invoke 'oc get cj' command and example output could be:
NAME      SCHEDULE   SUSPEND   ACTIVE   LAST SCHEDULE   AGE
example   @hourly    True      0        24m             34m

where you could see separate SUSPEND column

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12638

Bug OCPBUGS-12286: Enable/Disable plugin options are not shown on Operator details page

View the Description View the linked PRs

Description of problem:

The option to Enable/Disable a console plugin on Operator details page is not shown any more, it looks like a regression(the option is shown in 4.13)

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-04-19-125337

How reproducible:

Always

Steps to Reproduce:

1. Subscribe 'OpenShift Data Foundation' Operator from OperatorHub
2. on Operator installation page, we choose 'Disable' plugin
3. once operator is successfully installed, go to Installed Operators list page /k8s/all-namespaces/operators.coreos.com~v1alpha1~ClusterServiceVersion
4. console will show 'Plugin available' button for 'OpenShift Data Foundation' Operator, click on the button and hit 'View operator details', user will be taken to Operator details page

Actual results:

4. in OCP <= 4.13, we will show a 'Console plugin' item where user can Enable/Disable the console plugin operator has bring in

however this option is not shown in 4.14

Expected results:

4. Enable/Disable console plugin should be shown on Operator details page

Additional info:

screen recording https://drive.google.com/drive/folders/1fNlodAg6yUeUqf07BG9scvwHlzAwS-Ao?usp=share_link

https://github.com/openshift/console/pull/12766

Bug OCPBUGS-16569: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/78

Bug OCPBUGS-19545: kube-apiserver bound to port 60000 prevented metal3-baremetal-operator from starting

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18788~~. The following is the description of the original issue:
—
Description of problem:

metal3-baremetal-operator-7ccb58f44b-xlnnd pod failed to start on the SNO baremetal dualstack cluster:

Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               34m                    default-scheduler  Successfully assigned openshift-machine-api/metal3-baremetal-operator-7ccb58f44b-xlnnd to sno.ecoresno.lab.eng.tlv2.redha
t.com
  Warning  FailedScheduling        34m                    default-scheduler  0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports. preemption: 0/1 nodes are availabl
e: 1 node(s) didn't have free ports for the requested pod ports..
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(c4a8b353e3ec105d2bff2eb1670b82a0f226ac1088b739a256deb9dfae6ebe54): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use
  Warning  FailedCreatePodSandBox  34m                    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to add hostport mapping for sandbox k8s_metal3-bare
metal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0(9e6960899533109b02fbb569c53d7deffd1ac8185cef3d8677254f9ccf9387ff): cannot open hostport 60000 for pod k8s
_metal3-baremetal-operator-7ccb58f44b-xlnnd_openshift-machine-api_5f6d8c69-a508-47f3-a6b1-7701b9d3617e_0_: listen tcp4 :60000: bind: address already in use

Version-Release number of selected component (if applicable):

4.14.0-rc.0

How reproducible:

so far once

Steps to Reproduce:

1. Deploy disconnected baremetal SNO node with dualstack networking with agent-based installer
2.
3.

Actual results:

metal3-baremetal-operator pod fails to start

Expected results:

metal3-baremetal-operator pod is running

Additional info:

Checking the pots on node showed it was `kube-apiserver` process bound to the port:

tcp   ESTAB      0      0                                                [::1]:60000                        [::1]:2379    users:(("kube-apiserver",pid=43687,fd=455))


After rebooting the node all pods started as expected

https://github.com/openshift/cluster-baremetal-operator/pull/366

Bug OCPBUGS-16232: Forced upgrade annotation should take precedence over z-stream upgrade detection

View the Description View the linked PRs

Reported by IBM.

Apparently, they run in such a way that status.Version.Desired.Version is not guaranteed to be a parseable semantic version. Thus isUpgradeble returns an error and blocks upgrade, even if the force upgrade annotation is present.

We should check for the annotation first and if the upgrade is being forced, we don't need to do the z-stream upgrade check.

https://redhat-internal.slack.com/archives/C01C8502FMM/p1689279310050439

https://github.com/openshift/hypershift/pull/2823

Bug OCPBUGS-17925: Image registry pruner job fails when cluster was installed without DeploymentConfig capability

View the Description View the linked PRs

Description of problem:

Image registry pruner job fails when cluster was installed without DeploymentConfig capability. 

Cluster was installed only with the following capapbilities:
{\"capabilities\":{\"baselineCapabilitySet\": \"None\", \"additionalEnabledCapabilities\": [ \"marketplace\", \"NodeTuning\" ] }}"

image-pruner pods are failing with the following error:

    state:
      terminated:
        containerID: cri-o://69562d80cafb23a07b9f1d020e1943448916558986092d8540b9a0e1fc3731a1
        exitCode: 1
        finishedAt: "2023-08-21T00:07:37Z"
        message: |
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #1 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #2 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #3 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #4 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
          attempt #5 has failed (exit code 1), going to make another attempt...
          Error from server (NotFound): the server could not find the requested resource (get deploymentconfigs.apps.openshift.io)
        reason: Error
        startedAt: "2023-08-21T00:00:05Z"

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-16-114741

How reproducible:

100%

Steps to Reproduce:

1. Install SNO cluster withou DeploymentConfig capability
2. Check image pruner jobs status

Actual results:

Image pruner jobs do not complete because deploymentconfigs.apps.openshift.io api is not available.

Expected results:

Image pruner jobs can run without deploymentconfigs api

Additional info:

https://github.com/openshift/oc/pull/1530

Bug OCPBUGS-21544: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api/pull/184

Bug OCPBUGS-7232: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/12429

Bug MGMT-14481: [4.13] logs not sent during installation on rhel9.2

View the Description View the linked PRs

Description of the problem:
https://redhat-internal.slack.com/archives/C01QX5JEDP0/p1682946068422739?thread_ts=1682945335.566899&cid=C01QX5JEDP0

post installation . downloaded collected logs and agent logs are emtpy.
attaching logs.

https://github.com/openshift/assisted-installer-agent/pull/538

Bug OCPBUGS-10120: Update 4.14 ose-alibaba-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-alibaba-cloud/pull/30

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-alibaba-cloud/pull/30

Bug OCPBUGS-15909: not all node provisioned in e2e-metal-ipi-ovn-dualstack

View the Description View the linked PRs

It seems the e2e-metal-ipi-ovn-dualstack job is permafailing the last couple of days.
sippy link

one common symptom seems to be that some nodes are being fully provisioned.
here is an example from this job

you can see the clusteroperators are not happy and specifically machine-api is stuck in init

https://github.com/openshift/image-customization-controller/pull/92

Bug OCPBUGS-11304: Failing test [bz-Machine Config Operator] Nodes should reach OSUpdateStaged in a timely fashion

View the Description View the linked PRs

Description of problem:

Nodes are taking more than 5m0s to stage OSUpdate

https://sippy.dptools.openshift.org/sippy-ng/tests/4.13/analysis?test=%5Bbz-Machine%20Config%20Operator%5D%20Nodes%20should%20reach%20OSUpdateStaged%20in%20a%20timely%20fashion 

Test started failing back on 2/16/2023. First occurrence of the failure https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-aws-sdn-upgrade/1626326464246845440 

Most recent occurrences across multiple platforms https://search.ci.openshift.org/?search=Nodes+should+reach+OSUpdateStaged+in+a+timely+fashion&maxAge=48h&context=1&type=junit&name=4.13&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

6 nodes took over 5m0s to stage OSUpdate:node/ip-10-0-216-81.ec2.internal OSUpdateStarted at 2023-02-16T22:24:56Z, did not make it to OSUpdateStaged
node/ip-10-0-174-123.ec2.internal OSUpdateStarted at 2023-02-16T22:13:07Z, did not make it to OSUpdateStaged
node/ip-10-0-144-29.ec2.internal OSUpdateStarted at 2023-02-16T22:12:50Z, did not make it to OSUpdateStaged
node/ip-10-0-179-251.ec2.internal OSUpdateStarted at 2023-02-16T22:15:48Z, did not make it to OSUpdateStaged
node/ip-10-0-180-197.ec2.internal OSUpdateStarted at 2023-02-16T22:19:07Z, did not make it to OSUpdateStaged
node/ip-10-0-213-155.ec2.internal OSUpdateStarted at 2023-02-16T22:19:21Z, did not make it to OSUpdateStaged}

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3695

Bug OCPBUGS-11359: Storage CO should clean up the previous CSIDriverOperator's version in status.versions

View the Description View the linked PRs

Description of problem:

Customer upgraded AWS cluster from 4.8 to 4.9. All are update well but When checking the co/storage.status.versions, the AWSEBSCSIDriverOperator version is list but with previous version: 
$ oc get co storage -o json | jq .status.versions
[
  {
    "name": "operator",
    "version": "4.9.50"
  },
  {
    "name": "AWSEBSCSIDriverOperator",
    "version": "4.8.48"
  }
]

From 4.9, seems CSO doesn't report the CSIDriverOperator version, so the previous CSIDriverOperator version which is not correct should be cleaned up in such case.

Version-Release number of selected component (if applicable):

upgrade from 4.8.48 to 4.9.50

How reproducible:

Always

Steps to Reproduce:

1. Install AWS cluster with 4.8
2. Upgrade cluster to 4.9
3. Check co/storage.status.versions

Actual results:

[ { "name": "operator", "version": "4.9.50" }, { "name": "AWSEBSCSIDriverOperator", "version": "4.8.48" } ]

Expected results:

From 4.9. seems CSO doesn't report the CSIDriverOperator version, so the previous CSIDriverOperator version which is not correct should be cleaned up.

Additional info:

https://github.com/openshift/cluster-storage-operator/pull/374

Bug OCPBUGS-15989: OpenShift 4.12.18 install fails with Tigera Calico v3.16

View the Description View the linked PRs

Description of problem:

We have OCP 4:10 installed along with Tigera 3.13 with no issues. We could also update OCP to 4:11 and 4:12 along with Tigera upgrade to 3.15 and 3.16. The upgrade works with no issue. The problem appears when we install Tigera 3.16 along with OCP 4.12. (fresh install)
Tigera support says OCP install parameters need to be updated to accommodate new Tigera updates. Its either in the Terraform Plug-in or file called main.tf that need update. 
Please engage someone from RedHat OCP engineering team.

Ref doc: https://access.redhat.com/solutions/6980264

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

install Tigera 3.16 along with OCP 4.12. (fresh install)

Actual results:

Installation fails with the error: "rpc error: code = ResourceExhausted desc = grpc: received message larger than max (5330750 vs. 4194304)"

Expected results:

Just like 4.10, 4.12 installation should work with Tigera calico

Additional info:

https://github.com/openshift/installer/pull/7354

Bug OCPBUGS-18135: target.workload.openshift.io/management annotation on CNO causes delays for IBM ROKS Toolkit

View the Description View the linked PRs

Description of problem:

The target.workload.openshift.io/management annotation causes CNO operator pods to wait for nodes to appear. Eventually they give up waiting and they get scheduled. This annotation should not be set for the hosted control plane topology, given that we should not wait for nodes to exist for the CNO to be scheduled.

Version-Release number of selected component (if applicable):

4.14, 4.13

How reproducible:

always

Steps to Reproduce:

1. Create IBM ROKS cluster
2. Wait for cluster to come up
3.

Actual results:

Cluster takes a long time to come up because CNO pods take ~15 min to schedule.

Expected results:

Cluster comes up quickly

Additional info:

Note: Verification for the fix has already happened on the IBM Cloud side. All OCP QE needs to do is to make sure that the fix doesn't cause any regression to the regular OCP use case.

https://github.com/openshift/cluster-network-operator/pull/1955

Bug OCPBUGS-19523: sdn container failing to start on okd-scos

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19018~~. The following is the description of the original issue:
—
using metal-ipi on 4.14 the cluster is failing to come up,

the network cluster-operator is failing to start, the sdn pod shows the error

bash: RHEL_VERSION: unbound variable

https://github.com/openshift/cluster-network-operator/pull/2017

Bug OCPBUGS-23566: Bump to kubernetes 1.27.8

View the Description View the linked PRs

This fix contains the following changes coming from updated version of kubernetes up to v1.27.8:

Changelog:
v1.27.8: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1277
v1.27.7: https://github.com/kubernetes/kubernetes/blob/release-1.27/CHANGELOG/CHANGELOG-1.27.md#changelog-since-v1276

Bug OCPBUGS-12629: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-gcp/pull/48

Bug OCPBUGS-23903: [4.14] Ironic side of external_http_url (METAL-163) is not wired in correctly

View the Description View the linked PRs

Description of problem:

In the implementation of ~~METAL-163~~, the support for the new Ironic Node field external_http_url was only added for floppy-based configuration images, not for CD images that we use in OpenShift. This makes external_http_url a no-op.

See https://review.opendev.org/c/openstack/ironic/+/901696

https://github.com/openshift/ironic-image/pull/429

Bug OCPBUGS-24196: ApiVersion displayed on console is v1alpha1 whereas we support v1beta1

View the Description View the linked PRs

Description of Problem:

While creating a build via console, the YAML of build or buildRun switched to shipwright.io/v1alpha1 version, whereas we support shipwright.io/v1beta1 with GA.

Also, once the builds/buildRuns are created, the same buildRun will have v1alpha1 version yaml on console, and v1beta1 version yaml when checked through CLI.

Concerns:
- The mismatch of apiVersion on console vs CLI could be really confusing for customers.
- All our documents has v1beta1 references, and if a user is completely dependent on console, they might not be able to configure the builds easily.

Version-Release number of selected component (if applicable):

OCP 4.14
builds for OpenShift v1.0.0

How reproducible:

Always

Steps to Reproduce:

1. Deploy the builds for Red Hat OpenShift release candidate operator
2. Go to the "Developer preview" and create a Build via console. Specify the following yaml definition:
~~~
apiVersion: shipwright.io/v1beta1
kind: Build
metadata:
  name: buildah-golang-build
spec:
  source:
    git:
      url: https://github.com/shipwright-io/sample-go
    contextDir: docker-build
  strategy:
    name: buildah
    kind: ClusterBuildStrategy
  paramValues:
  - name: dockerfile
    value: Dockerfile
  output:
    image: image-registry.openshift-image-registry.svc:5000/<namespace>/sample-go-app
~~~
3. Once the build is created, start the build.
4. Check the yamls of build & buildRun, vs, the yaml through CLI:
   $ oc get builds.shipwright.io buildah-golang-build -oyaml
   $ oc get br <buildah-golang-buildrun-name> -oyaml
5. Observe the difference in apiVersions being referred.

Actual results:

The apiVersions are different for the same object.

Expected results:

The apiVersions should be in sync (console & CLI).

Additional info:

This issue is not observed for OCP 4.15 nightly.

https://github.com/openshift/console/pull/13402

Bug OCPBUGS-4476: On-prem keepalived ingress check is heavy on api-server at scale

View the Description View the linked PRs

Description of problem: While running scale tests of OpenShift on OpenStack at scale, we're seeing it performing significantly worse than on AWS platform for the same number of nodes. More specifically, we're seeing high traffic to API server, and high load for the haproxy pod.

Version-Release number of selected component (if applicable):

All supported versions

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

Slack thread at https://coreos.slack.com/archives/CBZHF4DHC/p1669910986729359 provides more info.

https://github.com/openshift/machine-config-operator/pull/3441

Bug OCPBUGS-10109: Update 4.14 prometheus-config-reloader image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/prometheus-operator/pull/221

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/prometheus-operator/pull/221

Bug OCPBUGS-11369: CPMS e2e periodics tests timeout failures

View the Description View the linked PRs

Description of problem:

In the control plane machine set operator we perform e2e periodic tests that check the ability to do a rolling update of an entire OCP control plane.

This is a quite involved test as we need to drain and replace all the master machines/nodes, altering operators, waiting for machines to come up + bootstrap and nodes to drain and move their workloads to others while respecting PDBs, and etcd quorum.

As such we need to make sure we are robust to transient issues, occasionaly slow-downs and network errors.

We have investigated these timeout issues and identified some common culprits that we want to address, see: https://redhat-internal.slack.com/archives/GE2HQ9QP4/p1678966522151799

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/179

Bug OCPBUGS-1147: HostFirmwareSettings controller keeps reconciling detached hosts

View the Description View the linked PRs

Description of problem:

2022-09-12T13:48:57.505323919Z {"level":"info","ts":1662990537.5052269,"logger":"controllers.BareMetalHost","msg":"start","baremetalhost":"qe2/master-1-0"}
2022-09-12T13:48:57.566917845Z {"level":"info","ts":1662990537.5668473,"logger":"provisioner.ironic","msg":"no node found, already deleted","host":"qe2~master-1-0"}
2022-09-12T13:48:57.566945972Z {"level":"info","ts":1662990537.566904,"logger":"controllers.BareMetalHost","msg":"done","baremetalhost":"qe2/master-1-0","provisioningState":"available","requeue":true,"after":600}
2022-09-12T13:49:13.556690278Z {"level":"info","ts":1662990553.556591,"logger":"controllers.HostFirmwareSettings","msg":"start","hostfirmwaresettings":"qe2/master-1-0"}
2022-09-12T13:49:13.614818643Z {"level":"info","ts":1662990553.6147015,"logger":"controllers.HostFirmwareSettings","msg":"retrieving firmware settings and saving to resource","hostfirmwaresettings":"qe2/master-1-0","node":"48d24898-1911-4f43-82b0-0b15f8484ae7"}
2022-09-12T13:49:13.629455616Z {"level":"info","ts":1662990553.6293764,"logger":"controllers.HostFirmwareSettings","msg":"provisioner returns error","hostfirmwaresettings":"qe2/master-1-0","RequeueAfter:":30}

Version-Release number of selected component (if applicable):

How reproducible:

always

Steps to Reproduce:

1. Detach a BMH
2. Check BMO logs for errors
3. Check Ironic logs for errors

Actual results:

BMO and Ironic logs have errors related to the already deleted node.

Expected results:

No noise in the logs.

Additional info:

https://github.com/openshift/baremetal-operator/pull/259

Story HOSTEDCP-1121: Solution for existing clusters to let vpc endpoint use hypershift security group

View the Description View the linked PRs

User Story:

As IBM running HCs I want to upgrade an existing 4.12 HC suffering https://issues.redhat.com/browse/OCPBUGS-13639 towards 4.13 and let the private link endpoint to use the right security group.

Acceptance Criteria:

There's an automated/documented steps for the HC to endup with the endpoint pointing to the right SG.

A possible semi-automated path would be to manually delete and detach the endpoint from the service, so the next reconciliation loop reset status https://github.com/openshift/hypershift/blob/7d24b30c6f79be052404bf23ede7783342f0d0e5/control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go#L410-L444

And the next one would recreate the new endpoint with the right security group https://github.com/openshift/hypershift/blob/7d24b30c6f79be052404bf23ede7783342f0d0e5/control-plane-operator/controllers/awsprivatelink/awsprivatelink_controller.go#L470-L525

Note this would produce connectivity down time while reconciliation happens.

Alternatively we could codify a path to update the endpoint SG when we detect a discrepancy with the hypershift SG.

https://github.com/openshift/hypershift/pull/2872

Bug OCPBUGS-10163: Update 4.14 openshift-enterprise-egress-router image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/131

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/131

Bug OCPBUGS-12859: TestDNSLogging e2e test flakes

View the Description View the linked PRs

The e2e test "TestDNSLogging" from https://github.com/openshift/cluster-dns-operator/tree/master/test/e2e fails intermittently.

Recently seen in:

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/364/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1650652904379387904

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-dns-operator/364/pull-ci-openshift-cluster-dns-operator-master-e2e-aws-ovn-operator/1651228691235082240

https://github.com/openshift/cluster-dns-operator/pull/365

Bug OCPBUGS-14121: Agent based IPV6 only install fails when the RendezvousIP is not canonical

View the Description View the linked PRs

Description of problem:

When doing an IPV6 only agent based installer on bare metal this fails if the RendezvousIP value is not canonical.

Version-Release number of selected component (if applicable):

OCP 4.12

How reproducible:

Every time.

Steps to Reproduce:

1. Configure the agent through agen-config.yaml for an IPV6 only install.
2. Set to something that is correct, but not canonical: 
   for example: rendezvousIP: 2a00:8a00:4000:020c:0000:0000:0018:143c 
3. Generate discovery iso and boot nodes.

Actual results:

Installation fails because the set-node-zero.sh script fails to discover that it is running on node zero.

Expected results:

Installation completes.

Additional info:

The code that detects wether a host is node-zero uses this:

is_rendezvous_host=$(ip -j address | jq "[.[].addr_info] | flatten | map(.local==\"$NODE_ZERO_IP\") | any")

This fails in unexpected ways with IPV6 that are not canonical, as the output of ip address is always canonical, but in this case the value for $NODE_ZERO_IP wasn't.

We did test this on the node itself: 

[root@slabnode2290 bin]# ip -j address | jq '[.[].addr_info] | flatten | map(.local=="2a00:8a00:4000:020c:0000:0000:0018:143c") | any' 
false

[root@slabnode2290 bin]# ip -j address | jq '[.[].addr_info] | flatten | map(.local=="2a00:8a00:4000:20c::18:143c") | any'
true

A solution may be to use a tool like ipcalc, once available, to do this test and make it less strict. In the mean time a note in the docs would be a good idea.

https://github.com/openshift/installer/pull/7234

Bug OCPBUGS-16536: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/service-ca-operator/pull/213

Bug OCPBUGS-18026: Rebase Azure CCM for bugfixes

View the Description View the linked PRs

In order to ship a high quality Azure CCM we want to downstream important bugfixes that were recently merged upstream.

https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4217
Dropping the need of https://github.com/openshift/cloud-provider-azure/pull/76 given the above
https://github.com/kubernetes-sigs/cloud-provider-azure/pull/4361

https://github.com/openshift/cloud-provider-azure/pull/75

Bug OCPBUGS-20066: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7551

Bug MGMT-15150: Unnecessarily using different installer binaries

View the Description View the linked PRs

~~MGMT-7549~~ added a change to use openshift-install instead of openshift-baremetal-install for platform:none clusters. This was to work around a problem where the baremetal binary was not available for an ARM target cluster, and at the time only none platform was supported on ARM. This problem was resolved by ~~MGMT-9206~~, so we no longer need the workaround.

https://github.com/openshift/assisted-service/pull/5334

Bug OCPBUGS-12133: Update 4.14 ose-cluster-kube-controller-manager-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-kube-controller-manager-operator/pull/726

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/727

Bug OCPBUGS-13168: Invalid CA certificate bundle provided by service account token

View the Description View the linked PRs

Description of problem:

oc login --token=$token
--server=https://api.dalh-dev-hs-2.05zb.p3.openshiftapps.com:443 --certificate-authority=ca.crt
The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.

The referenced "ca.crt" comes from the Secret created when a Service Account is created.

Version-Release number of selected component (if applicable): 4.12.12

How reproducible: Always

https://github.com/openshift/hypershift/pull/2584

Bug OCPBUGS-23936: CCM uses MC's KAS instead of HC's KAS

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23921~~. The following is the description of the original issue:
—
Description of problem:

    The way CCM is deployed, it gets the kubeconfig configuration from the environment it runs on, which is the Management cluster. Thus, it communicates with the Kubernetes Api Server (KAS) of the Management Cluster (MC) instead of the KAS of the Hosted Cluster it is part of.

Version-Release number of selected component (if applicable):

    4.15.0

How reproducible:

    100%

Steps to Reproduce:

    1. Deploy a hosted cluster
    2. oc debug to the node running the HC CCM
    3. crictl ps -a to list all the containers
    4. crictl inspect X  # Where X is the container id of the CCM container
    5. nsenter -n -t pid_of_ccm_container
    6. tcpdump

Actual results:

    Communication goes to MC KAS

Expected results:

    Communication goes to HC KAS

Additional info:

https://github.com/openshift/hypershift/pull/3232

Bug OCPBUGS-11736: GCP XPN Installs Require bindPrivateDNSZone Permission in host project

View the Description View the linked PRs

Description of problem:

GCP XPN installs require the permission `projects/<host-project>/roles/dns.networks.bindPrivateDNSZone` in the host project. This permission is not always provided in organizations. The installer requires this permission in order to create a private DNS zone and bind it to the shared networks.

Instead, the installer should be able to create records in a provided private zone that matches the base domain.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7155

Bug OCPBUGS-15489: No datapoints found for Dashboards default API performance V2 option

View the Description View the linked PRs

Description of problem:

Opened the web-console and navigate to Dashboards, the default API performance V2 option selected, shows No datapoints found for each sub-pages.

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-27-000502

How reproducible:

always

Steps to Reproduce:

1. Open the web-console and navigate to Dashboards, the default API performance V2 option selected, shows No datapoints found for each sub-pages.

Actual results:

No datapoints found for Dashboards default API performance V2 option and shows blank page.

Expected results:

Should show diagrams for Dashboards default API performance V2 option

Additional info:
This blocked bug https://issues.redhat.com/browse/OCPBUGS-14940, when I filed the bug https://issues.redhat.com/browse/OCPBUGS-14940, not seen this.

https://github.com/openshift/cluster-kube-apiserver-operator/pull/1521

Bug OCPBUGS-10105: Update 4.14 ose-cluster-autoscaler-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cluster-autoscaler-operator/pull/271

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cluster-autoscaler-operator/pull/271

Bug OCPBUGS-4009: Cluster operator should report ConsolePlugin as a related resource

View the Description View the linked PRs

Description of problem:

Since the operator watches plugins to enable dynamic plugins, it should list that resource under `status.relatedObjects` in its ClusterOperator.

Additional info:

Migrated from bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2044588

https://github.com/openshift/console-operator/pull/706

Bug OCPBUGS-12341: Update 4.14 marketplace-operator image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/operator-framework/operator-marketplace/pull/515

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/operator-framework/operator-marketplace/pull/515

Bug OCPBUGS-19921: Avoid panicking on all-fresh-cache evaluation

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19868~~. The following is the description of the original issue:
—

Description of problem:

The cluster-version operator should not crash while trying to evaluate a bogus condition.

Version-Release number of selected component (if applicable):

4.10 and later are exposed to the bug. It's possible that the ~~OCPBUGS-19512~~ series increases exposure.

How reproducible:

Unclear.

Steps to Reproduce:

1. Create a cluster.
2. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge.json (you may need to adjust version strings and digests for your test-cluster's release).
3. Wait around 30 minutes.
4. Point it at https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json (again, may need some customization).

Actual results:

$ grep -B1 -A15 'too fresh' previous.log
I0927 12:07:55.594222       1 cincinnati.go:114] Using a root CA pool with 0 root CA subjects to request updates from https://raw.githubusercontent.com/shellyyang1989/upgrade-cincy/master/cincy-conditional-edge-invalid-promql.json?arch=amd64&channel=stable-4.15&id=dc628f75-7778-457a-bb69-6a31a243c3a9&version=4.15.0-0.test-2023-09-27-091926-ci-ln-01zw7kk-latest
I0927 12:07:55.726463       1 cache.go:118] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is the most stale cached cluster-condition match entry, but it is too fresh (last evaluated on 2023-09-27 11:37:25.876804482 +0000 UTC m=+175.082381015).  However, we don't have a cached evaluation for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, so attempt to evaluate that now.
I0927 12:07:55.726602       1 cache.go:129] {"type":"PromQL","promql":{"promql":"0 * group(cluster_version)"}} is stealing this cluster-condition match call for {"type":"PromQL","promql":{"promql":"group(cluster_version_available_updates{channel=buggy})"}}, because its last evaluation completed 30m29.849594461s ago
I0927 12:07:55.758573       1 cvo.go:703] Finished syncing available updates "openshift-cluster-version/version" (170.074319ms)
E0927 12:07:55.758847       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 194 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x1c4df00?, 0x32abc60})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001489d40?})
        /go/src/github.com/openshift/cluster-version-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x75
panic({0x1c4df00, 0x32abc60})
        /usr/lib/golang/src/runtime/panic.go:884 +0x213
github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql.(*PromQL).Match(0xc0004860e0, {0x220ded8, 0xc00041e550}, 0x0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/promql/promql.go:134 +0x419
github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache.(*Cache).Match(0xc0002d3ae0, {0x220ded8, 0xc00041e550}, 0xc0033948d0)
        /go/src/github.com/openshift/cluster-version-operator/pkg/clusterconditions/cache/cache.go:132 +0x982
github.com/openshift/cluster-version-operator/pkg/clusterconditions.(*conditionRegistry).Match(0xc000016760, {0x220ded8, 0xc00041e550}, {0xc0033948a0, 0x1, 0x0?})

Expected results:

No panics.

Additional info:

I'm still not entirely clear on how ~~OCPBUGS-19512~~ would have increased exposure.

https://github.com/openshift/cluster-version-operator/pull/976

Bug OCPBUGS-21417: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-gcp/pull/203

Bug OCPBUGS-27350: Contribute Pipeline metrics tab using the dynamic plugin

View the Description View the linked PRs

Description of problem:

Metrics tab in the Pipeline details page is not added using the console extension. So, duplicate Metrics tab is shown when it is added by Dynamic plugin. Add a flag to hide a static plugin Metrics tab.

https://github.com/openshift/console/pull/13520

Task MGMT-15356: Ensure that manifest names are distinct between manifests and openshift directories.

View the Description View the linked PRs

Manifests are copied from the object store (either S3 or pod) into the node that is performing the role of bootstrap during installation (or to the single node in an SNO setup)

They are copied into one of two directories according to the directory into which they were uploaded to the object store.

<cluster-id>/manifests/manifests/* will end up being copied to /run/ephemeral/var/opt/openshift/manifests/
<cluster-id>/manifests/openshift/* will end up being copied to /run/ephemeral/var/opt/openshift/openshift/manifest

After this step, any files that have been written to /run/ephemeral/var/opt/openshift/openshift/ are also copied to /run/ephemeral/var/opt/openshift/manifests/, any identically named files are overwritten as part of this operation.

https://github.com/openshift/installer/blob/1e9209ac80ed2cb4ba5663f519e51161a1d8858a/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L71C1-L71C27

This behaviour is entirely expected and correct, however it does lead to an issue where if a user chooses to upload a file to both directories with identical names, for example;

File 1: <cluster-id>/manifests/manifests/manifest1.yaml
File 2: <cluster-id>/manifests/openshift/manifest1.yaml

That the only File 2 would end up being applied and that File 1 would end up being overwritten during the bootkube phase.

We should prevent this from happening by treating any attempt to introduce the same file in two places as illegal, meaning that if File 2 is present, we should prevent the upload of File 1 and vice versa during the creation/update of a manifest.

https://github.com/openshift/assisted-service/pull/5382

Story TRT-1118: hypershift failing on latest payload

View the Description View the linked PRs

On https://amd64.ocp.releases.ci.openshift.org/releasestream/4.14.0-0.ci/release/4.14.0-0.ci-2023-06-30-020413, hypershift started permafailing

Example run https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-hypershift-release-4.14-periodics-e2e-aws-ovn/1674671927274246144

https://github.com/openshift/hypershift/pull/2757

Bug OCPBUGS-14565: images: RHEL-8-based container image is broken

View the Description View the linked PRs

Description of problem:

Dockerfile.upi.ci.rhel8 does not work with the following error:

[3/3] STEP 26/32: RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"
chmod: cannot access '/root/.bluemix/': No such file or directory
error: build error: building at STEP "RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"": while running runtime: exit status 1

Version-Release number of selected component (if applicable):

master (and possibly all other branches where the ibmcli tool was introduced)

How reproducible:

always

Steps to Reproduce:

1. Try to use Dockerfile.ci.upi.rhel8
2.
3.

Actual results:

[3/3] STEP 26/32: RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/" chmod: cannot access '/root/.bluemix/': No such file or directory error: build error: building at STEP "RUN chown 1000:1000 /output && chmod -R g=u "$HOME/.bluemix/"": while running runtime: exit status 1

Expected results:

No failures

Additional info:

We should also change the downloading of the govc image with curl to importing it from the cached container in quay.io, as it is done in Dockerfile.ci.upi

https://github.com/openshift/installer/pull/7231

Bug OCPBUGS-21262: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/176

Bug OCPBUGS-6661: CRL configmap is limited by 1MB max, not allowing for multiple public CRLS.

View the Description View the linked PRs

Description of problem:

CRL list is capped at 1MB due to configmap max size. If multiple public CRLs are needed for ingress controller the CRL pem file will be over 1MB.

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Create CRL configmap with the following distribution points: 

         Issuer: C=US, O=DigiCert Inc, CN=DigiCert Global G2 TLS RSA SHA256 2020 CA1
         Subject: SOME SIGNED CERT            X509v3 CRL Distribution Points: 
                Full Name:
                  URI:http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-2.cr  
       
      
# curl -o DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl http://crl3.digicert.com/DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl
# openssl crl -in  DigiCertGlobalG2TLSRSASHA2562020CA1-2.crl -inform DER -out  DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem 
# du -bsh DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem 
604K    DigiCertGlobalG2TLSRSASHA2562020CA1-2.pem


I still need to find more intermediate CRLS to grow this.

Actual results:

2023-01-25T13:45:01.443Z ERROR operator.init controller/controller.go:273 Reconciler error {"controller": "crl", "object": {"name":"custom","namespace":"openshift-ingress-operator"}, "namespace": "openshift-ingress-operator", "name": "custom", "reconcileID": "d49d9b96-d509-4562-b3d9-d4fc315226c0", "error": "failed to ensure client CA CRL configmap for ingresscontroller openshift-ingress-operator/custom: failed to update configmap: ConfigMap \"router-client-ca-crl-custom\" is invalid: []: Too long: must have at most 1048576 bytes"}

Expected results:

First be able to create a configmap where data only accounted to the 1MB max (see additional info below for more details), second some way to compress or allow a large CRL list that would be larger than 1MB

Additional info:

Only using this CRL and it being only 600K still causes issue and it could be due to  the `last-applied-configuration` annotation on the configmap. This is added since we do an apply operation (update) on the configmap. I am not sure if this is counting towards the 1MB max. 

https://github.com/openshift/cluster-ingress-operator/blob/release-4.10/pkg/operator/controller/crl/crl_configmap.go#L295 

Not sure if we could just replace the configmap.

Bug OCPBUGS-12609: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/gcp-pd-csi-driver-operator/pull/69

Bug OCPBUGS-13120: Serverless functions UI warning is misleading

View the Description View the linked PRs

Description of problem:

https://github.com/openshift/openshift-docs/pull/59549#discussion_r1184195239

per the discussion here, the text in the dev console when creating a function says a func.yaml file must be present OR it must use the s2i build strategy, when in fact both things are required

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Go to +Add -> Create Serverless function and use a repo URL that doesn't fit the requirements in order to see the error

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12923

Bug OCPBUGS-15835: 9% of OKD tests failing on error: tag latest failed: Internal error occurred: registry.centos.org/dotnet/dotnet-31-centos7:latest: Get "https://registry.centos.org/v2/": dial tcp: lookup registry.centos.org on 172.30.0.10:53: no such host

View the Description View the linked PRs

Description of problem:

https://search.ci.openshift.org/?search=error%3A+tag+latest+failed%3A+Internal+error+occurred%3A+registry.centos.org&maxAge=48h&context=1&type=build-log&name=okd&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Version-Release number of selected component (if applicable):

all currently tested versions

How reproducible:

~ 9% of jobs fail on this test

 ! error: Import failed (InternalError): Internal error occurred: registry.centos.org/dotnet/dotnet-31-runtime-centos7:latest: Get "https://registry.centos.org/v2/": dial tcp: lookup registry.centos.org on 172.30.0.10:53: no such host   782 31 minutes ago

https://github.com/openshift/origin/pull/28029

Bug MGMT-12967: Clusters installed with the Assisted Installer fail Compliance Operator scans

View the Description View the linked PRs

Description of the problem:
Assisted installer namespace `assisted-installer` is not compliant with the `ocp4-cis-configure-network-policies-namespaces` Compliance Operator scan.

How reproducible:
Everytime

Steps to reproduce:

1. Install a cluster with Assisted Intaller
2. Confirm the `assisted-installer` Namespace is present and not removed
3. Install the Red Hat Compliance Operator
4. Run a compliance scan using the `ocp4-cis`

Actual results:
Cluster fails the scan with the following warning
```
Ensure that application Namespaces have Network Policies defined high
fail
```

Expected results:
Cluster does not fail the scan

https://github.com/openshift/assisted-installer/pull/658

Bug OCPBUGS-16912: When using IPv6 DHCP the address provided does not match rendezvousIP

View the Description View the linked PRs

Description of problem:

In an IPv6 environment using DHCP, it may not be possible to configure a rendezvousIP that matches the actual address. This is because by default NetworkManager uses DUID-UUIDs for Client ID in the IPv6 DHCP Soliciation (see https://datatracker.ietf.org/doc/html/rfc6355) which are machine dependent. As a result, the DHCPv6 server cannot be configured with a pre-determined Client ID/IPv6 Address pair that matches the rendezvousIP and the nodes will be assigned random IPv6 addresses from the pool of DHCP addresses.

We can see the flow here (the DUID-UUID has a 00:04 prefix)

DHCPSOLICIT(ostestbm) 00:04:56:d2:b1:0b:ba:ef:8c:1a:00:58:3f:ed:e5:d3:5f:85

The DHCP server therefore assigns a new address from the pool, fd2e:6f44:5dd8:c956::32 in this case:
DHCPREPLY(ostestbm) fd2e:6f44:5dd8:c956::32 00:04:56:d2:b1:0b:ba:ef:8c:1a:00:58:3f:ed:e5:d3:5f:85

NetworkManager needs to be configured to use a deterministic Client ID so that a reliable Client ID/IPv6 address can be added to a DHCP server. The best way to do this is to configure NM for dhcp-duid=ll so that it uses a DUID-LL which based on the interface mac address. This is the approach taken by Baremetal IPI in   https://github.com/openshift/machine-config-operator/pull/1395

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Every time

Steps to Reproduce:

1. In an IPv6 environment set up agent-config.yaml with an expected IPv6 address and create the ISO
2. It's not possible to configure the DHCP server to assign this address since the Client ID that Node0 will use is unknown
3. Boot the nodes using the created ISO. The nodes will get IPv6 addresses from the DHCP server but its not possible to access the RendezvousIP

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7384

Bug OCPBUGS-18086: controller logs are unnecessarily noisy

View the Description View the linked PRs

controller: Drop noisy log message about certificates

I often turn to the controller pod logs to debug issues, and
this log message is repeated very often. While it was
probably useful at the time the feature was being developed/tested
I doubt it will be necessary in the future.

In the end, the status really is the debugging frontend I believe.

controller: Drop noisy BaseOSContainerImage log message

In general we should avoid logging unless something changed.
I don't believe we need this log message, we can detect OS
changes from e.g. the MCD logs.

https://github.com/openshift/machine-config-operator/pull/3886

Bug OCPBUGS-18781: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1106

Bug OCPBUGS-23358: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/operator-framework-rukpak/pull/53

Bug OCPBUGS-10794: Missing vCenter build number in telemetry

View the Description View the linked PRs

Description of problem:

Our telemetry contains only vCenter version ("7.0.3") and not the exact build number. We need the build number to know what exact vCenter build user has and what bugs are fixed there (e.g. https://issues.redhat.com/browse/OCPBUGS-5817).

https://github.com/openshift/vsphere-problem-detector/pull/102

Bug OCPBUGS-11996: Make Serverless form is broken

View the Description View the linked PRs

Description of problem:

Not able to convert a deployment to a Serverless as Make Serverless form in the console is broken.

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Steps to Reproduce:

1. Create a deployment using a Container image flow
2. Select Make Serverless option from the topology actions menu of the created deployment
3.

Actual results:

After clicking on create it throw an error

Expected results:

Should create a Serverless resource.

Additional info:

https://github.com/openshift/console/pull/12815

Bug OCPBUGS-17693: BMO is not able to reach the IRONIC_ENDPOINT

View the Description View the linked PRs

Description of problem:

When testing AWS on-prem BM expansion, the BMO is not able to reach the IRONIC_ENDPOINT

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-10-021647

How reproducible:

100%

Steps to Reproduce:

1. Install IPI AWS 3-node-compact cluster
2. Deploy BMO via YAML
3. Connect AWS against external on-prem env via VPN (out of scope)
4. Create BMH using "preprovisioningNetworkDataName" to push static IP and routes.

Actual results:

BMO is not able to reach the Ironic endpoint with the following error:

~~~
2023-08-10T16:09:22.216778289Z {"level":"info","ts":"2023-08-10T16:09:22Z","logger":"provisioner.ironic","msg":"error caught while checking endpoint","host":"openshift-machine-api~openshift-qe-065","endpoint":"https://metal3-state.openshift-machine-api.svc.cluster.local:6385/v1/","error":"Get \"https://metal3-state.openshift-machine-api.svc.cluster.local:6385/v1\": dial tcp 172.30.19.119:6385: i/o timeout"}
~~~

Expected results:

Standard deploy

Additional info:

Must-gather provided separatedly

Bug OCPBUGS-19333: [4.14] The BMH is stuck in registering "failed to register host in ironic: Bad Gateway"

View the Description View the linked PRs

OCP 4.14.0-rc.0
advanced-cluster-management.v2.9.0-130
multicluster-engine.v2.4.0-154

After encountering https://issues.redhat.com/browse/OCPBUGS-18959

Attempted to forcefully delete the BMH by removing the finalizer.
Then deleted all the metal3 pods.

Attempted to re-create the bmh.

Result:
the bmh is stuck in

oc get bmh
NAME                                           STATE         CONSUMER   ONLINE   ERROR   AGE
hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com   registering              true             15m

seeing this entry in the BMO log:

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"start","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"hardwareData is ready to be deleted","baremetalhost":{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}}
{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"controllers.BareMetalHost","msg":"host ready to be powered off","baremetalhost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"provisioningState":"powering off before delete"}

{"level":"info","ts":"2023-09-13T16:15:57Z","logger":"provisioner.ironic","msg":"ensuring host is powered off (mode: hard)","host":"kni-qe-65~hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com"}

{"level":"error","ts":"2023-09-13T16:15:57Z","msg":"Reconciler error","controller":"baremetalhost","controllerGroup":"metal3.io","controllerKind":"BareMetalHost","BareMetalHost":

{"name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","namespace":"kni-qe-65"}

,"namespace":"kni-qe-65","name":"hp-e910-01.kni-qe-65.lab.eng.rdu2.redhat.com","reconcileID":"167061cc-7ab4-4c4a-ae45-8c19dfc3ac22","error":"action \"powering off before delete\" failed: failed to power off before deleting node: Host not registered","errorVerbose":"Host not registered\nfailed to power off before deleting node\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).actionPowerOffBeforeDeleting\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:493\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).handlePoweringOffBeforeDelete\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:585\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*hostStateMachine).ReconcileState\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/host_state_machine.go:202\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:225\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598\naction \"powering off before delete\" failed\ngithub.com/metal3-io/baremetal-operator/controllers/metal3%2eio.(*BareMetalHostReconciler).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/controllers/metal3.io/baremetalhost_controller.go:229\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1598","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/src/github.com/metal3-io/baremetal-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}

https://github.com/openshift/ironic-image/pull/402

Bug OCPBUGS-26312: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-driver-shared-resource-operator/pull/97

Story HOSTEDCP-969: Review metrics implementation

View the Description View the linked PRs

Context:

As we start receiving metrics consistently in OCM environments and we are creating SLOs dashboards that can consume data from any data source Prod/stage/CI we also want to revisit how we are sending metrics and make sure we are doing it int the most effective way. We have some wonky data coming through in prod atm.

DoD:

Atm we have high frequency reconciliation loop where we constantly review the over all state of the world by looping over all clusters.

We should review this approach and record metrics/events as it happens directly in the controllers/reconcile loop only once and not repeatedly in a loop when possible for each specific metric.

Bug OCPBUGS-12628: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-provider-azure/pull/60

Bug OCPBUGS-19083: Ironic: Invalid cross-device link

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19019~~. The following is the description of the original issue:
—
Using metal-ipi with okd-scos ironic fails to provision nodes

https://github.com/openshift/ironic-image/pull/399

Bug OCPBUGS-19318: WebhookConfiguration caBundle injection is incorrect when some webhooks already confiugred

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-8512~~. The following is the description of the original issue:
—
Description of problem:

WebhookConfiguration caBundle injection is incorrect when some webhooks already configured with caBundle.

Behavior seems to be that the first n number of webhooks in `.webhooks` array have caBundle injected, where n is the number of webhooks that do not have caBundle set.

Version-Release number of selected component (if applicable):

How reproducible

Steps to Reproduce:

1. Create a validatingwebhookconfigurations or mutatingwebhookconfigurations with `service.beta.openshift.io/inject-cabundle: "true"` annotation.

2. oc edit validatingwebhookconfigurations (or oc edit mutatingwebhookconfigurations)

3. Add a new webhook to the end of the list `.webhooks`. It will not have caBundle set manually as service-ca should inject it. 

4. Observe new webhook does not get caBundle injected.

Note: it is important in step. 3 that the new webhook is added to the end of the list.

Actual results:

Only the first n webhooks have caBundle injected where n is the number of webhooks without caBundle set.

Expected results:

All webhooks have caBundle injected when they do not have it set.

Additional info:

Open PR here: https://github.com/openshift/service-ca-operator/pull/207

The issue seems to be a mistake with go-lang for range syntax where "i" is the index of desired "i" to update.  

tl dr; code should update the value of the int in the array, not the index of the int in the array.

https://github.com/openshift/service-ca-operator/pull/222

Story HELM-502: Bump Helm version to 3.12 in ODC

View the Description View the linked PRs

Owner: Architect:

Story (Required)

As an ODC helm backend developer I would like to be able to bump version of helm to 3.12 to stay synched up with the version we will ship with OCP 4.14

Background (Required)

Normal activity we do every time a new OCP version is release to stay current

Glossary

NA

Out of scope

NA

Approach(Required)

Bump version of helm to 3.12 run, build and unit test and make sure everything is working as expected. Last time we had a conflict with DevFile backend.

Dependencies

Might had dependencies with DevFile team to move some dependencies forward

Edge Case

NA

Acceptance Criteria

Console Helm dependency is moved to 3.12

INVEST Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Legend

Unknown
Verified
Unsatisfied

https://github.com/openshift/console/pull/13014

Bug MGMT-14883: It shouldn't be possible to create a cluster with platform = OCI and OCP < 4.14

View the Description View the linked PRs

Description of the problem:
OCI platform is available only from OCP 4.14, we shouldn't be able to create an OCI cluster with OCP < 4.14

How reproducible:

You can reproduce with aicli

Steps to reproduce:

$ aicli --integration create cluster agentil-test-oci-19 -P platform='{"type": "oci"}' -P pull_secret=<your pull secret> -P user_managed_networking=true -P minimal=true -P openshift_version=4.13

Actual results:

 [agentil@fedora Downloads]$ aicli --integration create cluster agentil-test-oci-19 -P platform='{"type": "oci"}' -P pull_secret=~/Downloads/pull-secret.txt -P user_managed_networking=true -P minimal=true -P openshift_version=4.13
Creating cluster agentil-test-oci-19
Using karmalabs.corp as DNS domain as no one was provided
Forcing network_type to OVNKubernetes
Using version 4.13.2
Creating infraenv agentil-test-oci-19_infra-env
Using karmalabs.corp as DNS domain as no one was provided

[agentil@fedora Downloads]$ aicli --integration info cluster agentil-test-oci-19
ams_subscription_id: 2QvJWtlvlUIvFtCmOIPiwkHRirC
api_vips: []
base_dns_domain: karmalabs.corp
cluster_networks: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'cidr': '10.128.0.0/14', 'host_prefix': 23}]
connectivity_majority_groups: {"IPv4":[],"IPv6":[]}
controller_logs_collected_at: 0001-01-01 00:00:00+00:00
controller_logs_started_at: 0001-01-01 00:00:00+00:00
cpu_architecture: x86_64
created_at: 2023-06-08 12:42:36.327854+00:00
disk_encryption: {'enable_on': 'none', 'mode': 'tpmv2', 'tang_servers': None}
email_domain: redhat.com
feature_usage: {"Cluster Tags":{"id":"CLUSTER_TAGS","name":"Cluster Tags"},"Hyperthreading":{"data":{"hyperthreading_enabled":"all"},"id":"HYPERTHREADING","name":"Hyperthreading"},"OVN network type":{"id":"OVN_NETWORK_TYPE","name":"OVN network type"},"Platform selection":{"data":{"platform_type":"oci"},"id":"PLATFORM_SELECTION","name":"Platform selection"},"User Managed Networking With Multi Node":{"id":"USER_MANAGED_NETWORKING_WITH_MULTI_NODE","name":"User Managed Networking With Multi Node"}}
high_availability_mode: Full
hyperthreading: all
id: 65f2a1fa-efd2-419a-9bf0-802e595a0a63
ignition_endpoint: {'url': None, 'ca_certificate': None}
imported: False
ingress_vips: []
install_completed_at: 0001-01-01 00:00:00+00:00
install_started_at: 0001-01-01 00:00:00+00:00
ip_collisions: {}
machine_networks: []
monitored_operators: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'name': 'console', 'version': None, 'namespace': None, 'subscription_name': None, 'operator_type': 'builtin', 'properties': None, 'timeout_seconds': 3600, 'status': None, 'status_info': None, 'status_updated_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzutc())}, {'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'name': 'cvo', 'version': None, 'namespace': None, 'subscription_name': None, 'operator_type': 'builtin', 'properties': None, 'timeout_seconds': 3600, 'status': None, 'status_info': None, 'status_updated_at': datetime.datetime(1, 1, 1, 0, 0, tzinfo=tzutc())}]
name: agentil-test-oci-19
network_type: OVNKubernetes
ocp_release_image: quay.io/openshift-release-dev/ocp-release:4.13.2-x86_64
openshift_version: 4.13.2
org_id: 11009103
platform: {'type': 'oci'}
progress: {'total_percentage': None, 'preparing_for_installation_stage_percentage': None, 'installing_stage_percentage': None, 'finalizing_stage_percentage': None}
schedulable_masters: False
schedulable_masters_forced_true: True
service_networks: [{'cluster_id': '65f2a1fa-efd2-419a-9bf0-802e595a0a63', 'cidr': '172.30.0.0/16'}]
status: insufficient
status_info: Cluster is not ready for install
status_updated_at: 2023-06-08 12:42:36.324000+00:00
tags: aicli
updated_at: 2023-06-08 12:42:43.362119+00:00
user_managed_networking: True
user_name: agentil@redhat.com

Expected results:

The cluster creation should fail because the version of OCP is incompatible with OCI platform.

https://github.com/openshift/assisted-service/pull/5290

Bug OCPBUGS-16270: Info message below the Form Fields of the PAC section in the Import from Git page is not visible

View the Description View the linked PRs

Description of problem:

Info message below the "Git access token" Field for creating the Pipelines Repository under the Pipelines section in the Import from Git page is falling back to the default text instead of showing the curated ones of each Git provider.

The Info messages are curated for each of the Git Providers when we are creating the Repository from the Pipelines Page.

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Go to the Import from Git Page
2. Add a Git URL with PAC ( https://github.com/Lucifergene/oc-pipe )
3. Check the text under the "Git access token" Field

Actual results:

Use your Git Personal token. Create a token with repo, public_repo & admin:repo_hook scopes and give your token an expiration, i.e 30d.

Expected results:

Use your GitHub Personal token. Use this link to create a token with repo, public_repo & admin:repo_hook scopes and give your token an expiration, i.e 30d.

Additional info:

https://github.com/openshift/console/pull/13021

Bug OCPBUGS-17851: CPMS assumes diff on empty zone in Azure

View the Description View the linked PRs

Description of problem:


When we merged https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/229, it changed the way failure domains were injected for Azure so that additional fields could be accounted for. However, the CPMS failure domains have Azure zones as a string (which they should be) and the machine v1beta1 spec has them as a string pointer.

This means now that the CPMS is detecting the difference between the a nil zone and an empty string, even though every other piece of code in openshift treats them the same.

We should update the machine v1beta1 type to remove the pointer. This will be a no-op in terms of the data stored in etcd since the type is unstructured anyway.

It will then require updates to the MAPZ, CPMS, MAO and installer repositories to update their generation.

Version-Release number of selected component (if applicable):

4.14 nightlies from the merge of 229 onwards

How reproducible:

This is only affecting regions in Azure where there are no zones, currently in CI it's affecting about 20% of events.

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-provider-azure/pull/72

Bug OCPBUGS-3505: Cluster bootstrap waits for only one master to join before finishing

View the Description View the linked PRs

Description of problem:

While installing cluster with assisted installer lately we have cases when one of the master joins very quickly and start all needed pods in order for cluster bootstrap to finish but the second one joins only after that.
Keepalived can't start if there is only one joined cluster as it doesn't have enough data to build configuration files.
In HA mode cluster bootstrap should wait at least for 2 joined masters before removing bootstrap control plane as without it installation with fail.

Version-Release number of selected component (if applicable):

How reproducible:

Start bm installation and start one master, wait till it starts all required pods and then add others.

Steps to Reproduce:

1. Start bm installation 
2. Start one master 
3. Wait till it starts all required pods.
4. Add others

Actual results:

no vip, installation fails

Expected results:

installation succeeds, vip moves to master

Additional info:

https://github.com/openshift/cluster-bootstrap/pull/71

Bug OCPBUGS-12456: MCO has duplicate RotateKubeletServerCertificate flags

View the Description View the linked PRs

Description of problem:


MCO has duplicate feature flags set for Kubelet causing errors on bringup.

{{code}}
I0421 15:32:04.308472    2135 codec.go:98] "Using lenient decoding as strict decoding failed" err=<
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:         strict decoding error: yaml: unmarshal errors:
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:           line 29: key "RotateKubeletServerCertificate" already set in map
Apr 21 15:32:04 ip-10-0-156-156 kubenswrapper[2135]:  >
{{code}}

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3686

Bug OCPBUGS-7181: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1605

Story HOSTEDCP-1032: Add e2e test that ensures we are enforcing restricted pod security admission

View the Description View the linked PRs

From deads2k: I think creating pods that should get rejected in the kube-system namespace would ensure it. OCP-classic is still struggling with customers who did naughty things.

https://github.com/openshift/hypershift/pull/2714

Bug OCPBUGS-17312: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-api-provider-vsphere/pull/20

Bug OCPBUGS-17678: OperatorHub page in GUI is throwing 404 error for HyperShift cluster

View the Description View the linked PRs

Description of problem:

Deployed a OCP cluster using hypershift agent with 4.14.0-ec.4 release version on Power.
We are observing that when loading operator hub page in GUI is throwing 404 error

Version-Release number of selected component (if applicable):

OCP 4.14.0-ec.4

How reproducible:

Every time

Steps to Reproduce:

1. Deploy Hypershift cluster 
2. Go to GUI and check OperatorHub
3.

Actual results:

OperatorHub page in GUI is throwing 404 error

Expected results:

OperatorHub page should show Operators

Additional information:

Failure status in olm operator pod from management cluster:

# oc get pod olm-operator-754779f559-846tw -n clusters-hypershift-015 -oyaml

        message: |
          'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T10:58:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T10:59:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          time="2023-08-17T11:00:37Z" level=error msg="initialization error - failed to ensure name=\"\" - ClusterOperator.config.openshift.io \"\\\"\\\"\" is invalid: metadata.name: Invalid value: \"\\\"\\\"\": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')" monitor=clusteroperator
          I0817 11:01:33.000390       1 trace.go:205] Trace[2006040218]: "DeltaFIFO Pop Process" ID:system:controller:route-controller,Depth:152,Reason:slow event handlers blocking the queue (17-Aug-2023 11:01:28.947) (total time: 456ms):
          Trace[2006040218]: [456.950035ms] [456.950035ms] END
          2023/08/17 11:01:41 http: TLS handshake error from 10.244.0.10:33355: read tcp 172.17.53.0:8443->10.244.0.10:33355: read: connection reset by peer
        reason: Error
        startedAt: "2023-08-14T11:03:46Z"

Screenshot: https://drive.google.com/file/d/1I_XkX15xEl9ZBtAIZ2yp70twD4z2ASlS/view?usp=sharing

Must gather logs:

https://drive.google.com/file/d/1AkmzC_TUi9z6p13funrSygBm2CgepbpU/view?usp=sharing

https://github.com/openshift/hypershift/pull/2937

Bug OCPBUGS-21299: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-provider-powervs/pull/45

Bug OCPBUGS-12709: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

Due to removal of in-tree AWS provider https://github.com/kubernetes/kubernetes/pull/115838 we need to ensure that KCM is setting --external-cloud-volume-plugin flag accordingly, especially that the CSI migration was GA-ed in 4.12/1.25.

The original PR that fixed this (https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721) got reverted by mistake. We need to bring it back to unblock the kube rebase.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/729

Bug OCPBUGS-14793: Unable to do post-copy migration

View the Description View the linked PRs

This bug is created to get CNV bugzilla bug https://bugzilla.redhat.com/show_bug.cgi?id=2164836 fix into MCO repo.

https://github.com/openshift/machine-config-operator/pull/3724

Bug OCPBUGS-16025: Hide the Duplicate Pipelines Card in the DevConsole Add Page

View the Description View the linked PRs

Description of problem:

Hide the Duplicate Pipelines Card in the DevConsole Add Page

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always

Steps to Reproduce:

1. Visit +Add Page of Dev Perspective

Actual results:

Duplicate Entry

Expected results:

No duplicates

Additional info:

https://github.com/openshift/console/pull/13007

Bug OCPBUGS-16511: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-19344: 4.14 & 4.15 Azure Install Failures: Kubelet stopped posting node status

View the Description View the linked PRs

Description of problem:

Install issues for 4.14 && 4.15 where we lose contact with kublet on master nodes.

https://search.ci.openshift.org/?search=Kubelet+stopped+posting+node+status&maxAge=168h&context=1&type=build-log&name=periodic.*4.14.*azure.*sdn&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

This search shows its happening on about 35% of azure sdn 4.14 jobs over the past week at least. There are no ovn hits.

1703590387039342592/artifacts/e2e-azure-sdn-upgrade/gather-extra/artifacts/nodes.json

                    {
                        "lastHeartbeatTime": "2023-09-18T02:33:11Z",
                        "lastTransitionTime": "2023-09-18T02:35:39Z",
                        "message": "Kubelet stopped posting node status.",
                        "reason": "NodeStatusUnknown",
                        "status": "Unknown",
                        "type": "Ready"
                    }

4.14 is interesting as it is a minor upgrade from 4.13 and we see the install failures with a master node dropping out.

Focusing on periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade/1703590387039342592

Build log shows

[36mINFO[0m[2023-09-18T02:03:03Z] Using explicitly provided pull-spec for release initial (registry.ci.openshift.org/ocp/release:4.13.0-0.ci-2023-09-17-050449)

ipi-azure-conf shows region centralus (not the single zone westus)

get ocp version: 4.13
/output
Azure region: centralus

oc_cmds/nodes shows master-1 not ready

ci-op-82xkimh8-0dd98-9g9wh-master-1                  NotReady   control-plane,master   82m   v1.26.7+c7ee51f   10.0.0.6      <none>        Red Hat Enterprise Linux CoreOS 413.92.202309141211-0 (Plow)

ci-op-82xkimh8-0dd98-9g9wh-master-1-boot.log shows ignition

install log shows we have lost contact

time="2023-09-18T03:15:33Z" level=error msg="Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: [Missing operand on node ci-op-82xkimh8-0dd98-9g9wh-master-0, Missing operand on node ci-op-82xkimh8-0dd98-9g9wh-master-2]\nNodeControllerDegraded: The master nodes not ready: node \"ci-op-82xkimh8-0dd98-9g9wh-master-1\" not ready since 2023-09-18 02:35:39 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)"

4.15 4.15.0-0.ci-2023-09-17-172341 and 4.14 4.14.0-0.ci-2023-09-18-020137

Version-Release number of selected component (if applicable):

How reproducible:

We are seeing this on a high number of failed payloads for 4.14 && 4.15. Additional recent failures

4.14.0-0.ci-2023-09-17-012321
aggregated-azure-sdn-upgrade-4.14-minor shows failures like: Passed 5 times, failed 0 times, skipped 0 times: we require at least 6 attempts to have a chance at success indicating that only 5 of the 10 runs were valid.
Checking install logs shows we have lost master-2

time="2023-09-17T02:44:22Z" level=error msg="Cluster operator kube-apiserver Degraded is True with GuardController_SyncError::NodeController_MasterNodesReady: GuardControllerDegraded: [Missing operand on node ci-op-crj5cf00-0dd98-p5snd-master-1, Missing operand on node ci-op-crj5cf00-0dd98-p5snd-master-0]\nNodeControllerDegraded: The master nodes not ready: node \"ci-op-crj5cf00-0dd98-p5snd-master-2\" not ready since 2023-09-17 02:01:49 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.)"

oc_cmds/nodes also shows master-2 not ready

4.15.0-0.nightly-2023-09-17-113421 install analysis failed due to azure tech preview oc_cmds/nodes shows master-1 not ready

4.15.0-0.ci-2023-09-17-112341 aggregated-azure-sdn-upgrade-4.15-minor only 5 of 10 runs are valid sample oc_cmds/nodes shows master-0 not ready

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-config-operator/pull/3928

Bug OCPBUGS-21845: cnf-tests: [test_id: 55012] RPS configuration applied on some physical devices

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18662~~. The following is the description of the original issue:
—
Description of problem:
RPS configuration test failed with the following error:

[FAILED] Failure recorded during attempt 1:
a host device rps mask is different from the reserved CPUs; have "0" want ""
Expected
    <bool>: false
to be true
In [It] at: /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/onsi/gomega/internal/assertion.go:62 @ 09/06/23 03:47:44.144
< Exit [It] [test_id:55012] Should have the correct RPS configuration - /tmp/cnf-ZdGbI/cnf-features-deploy/vendor/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/performance.go:337 @ 09/06/23 03:47:44.144 (39.949s)

Full report:

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.15-e2e-telco5g-cnftests/1699249554244767744/artifacts/e2e-telco5g-cnftests/telco5g-cnf-tests/artifacts/test_results.html

How reproducible:

Very often

Steps to Reproduce:
1. Reproduce automatically by the cnf-tests nightly job

Actual results:
Some of the virtual devices are not configured with the correct RPS mask

Expected results:
All virtual network devices are expected to have the correct RPS mask

Bug OCPBUGS-17986: Update 4.14 ose-nutanix-cloud-controller-manager image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/cloud-provider-nutanix/pull/18

The PR has been automatically opened by ART (#forum-ocp-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #forum-ocp-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/cloud-provider-nutanix/pull/18

Bug MGMT-13009: [Staging] agent's auto name replace doesn't work with Static IPs and VLAN

View the Description View the linked PRs

Agent does not replace localhost.localdomain node names with MAC addresses
in case Cluster network configuration is Static IPs with VLAN
Found in agent log
Dec 20 17:37:42 localhost.localdomain inventory[2284]: time="20-12-2022 17:37:42" level=info msg="Replaced original forbidden hostname with calculated one" file="inventory.go:63" calculated=localhost.localdomain original=localhost.localdomain

As result
Cluster is not ready yet.
The cluster is not ready yet. Some hosts have an ineligible name. To change the hostname, click on it.

How reproducible:
1. Provision libvirt VMs and network with VLAN
2. Create cluster and select Static IP Network configuration
3. Fill all required filed in from view and press Next
4. Generate and download ISO
5. Wait until nodes will be UP and discovered

Actual results:
Nodes have localhost.localdomain names

Expected results:
Nodes have name as host's MAC address

https://github.com/openshift/assisted-installer-agent/pull/553

Bug MGMT-14242: installation of day2 arm worker in a day1 x86 cluster is blocked by assisted service

View the Description View the linked PRs

Description of the problem:
e2e-metal-assisted-day2-arm-workers-periodic job fails to install the day2 ARM worker because the the service marks the setup incompatible:

  time="2023-04-04T12:03:37Z" level=error msg="cannot use arm64 architecture because it's not compatible on version  of OpenShift" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).handlerClusterInfoOnRegisterInfraEnv" file="/assisted-service/internal/bminventory/inventory.go:4466" pkg=Inventory
time="2023-04-04T12:03:37Z" level=error msg="Failed to register InfraEnv test-infra-infra-env-fd527e12 with id 3e21770d-d607-431c-967c-5f632bec0cfb. Error: cannot use arm64 architecture because it's not compatible on version  of OpenShift" func="github.com/openshift/assisted-service/internal/bminventory.(*bareMetalInventory).RegisterInfraEnvInternal.func1" file="/assisted-service/internal/bminventory/inventory.go:4528" cluster_id=3e21770d-d607-431c-967c-5f632bec0cfb go-id=235 pkg=Inventory request_id=f8dd7eeb-efa7-4828-a8c5-e1486a8bc1d2

See https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_assisted-test-infra/2109/pull-ci-openshift-assisted-test-infra-master-e2e-metal-assisted-day2-arm-workers/1643199500098998272

How reproducible:

Run the job e2e-metal-assisted-day2-arm-workers which:

install a day1 x86 cluster
Add a day2 ARM worker to the day1 x86 cluster

Steps to reproduce:

1.

2.

3.

Actual results:

The job fails to add the day2 worker and the assisted service log shows:
"Error: cannot use arm64 architecture because it's not compatible on version of OpenShift"

Expected results:

The installation of the day2 ARM worker succeed without errors.

Elior Erez I assign this ticket to you as it looks like it is linked to the feature support code, can you have a look?

https://github.com/openshift/assisted-service/pull/5119

Bug OCPBUGS-10181: Update 4.14 openshift-enterprise-egress-dns-proxy image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/images/pull/133

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/images/pull/133

Bug OCPBUGS-22315: OKD: skip enabling gatewayd.socket

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-4038~~. The following is the description of the original issue:
—
Description of problem:

OKD installer attempts to enable systemd-journal-gatewayd.socket, which is not present on FCOS

Version-Release number of selected component (if applicable):

4.13

https://github.com/openshift/installer/pull/7628

Bug OCPBUGS-13782: Uninstall instructions supplied by the CSV creator should be visible no matter the optOut state

View the Description View the linked PRs

Description of problem:

If both the below mentioned annotations are used on an operator CSV, the uninstall instructions don't show up on the UI.
- console.openshift.io/disable-operand delete: "true"
- operator.openshift.io/uninstall-message: "some message"

Version-Release number of selected component (if applicable):

➜  $> oc version
Client Version: 4.12.0
Kustomize Version: v4.5.7
Server Version: 4.13.0-rc.5
Kubernetes Version: v1.26.3+379cd9f

➜  $> oc get co | grep console
console                                    4.13.0-rc.5   True        False         False      4h49m

How reproducible:

Always

Steps to Reproduce:

1.Add both the mentioned annotations on an operator CSV. 
2.Make sure "console.openshift.io/disable-operand delete" is set to "true".
3.Upon clicking "Uninstall operator", the result can be observed on the pop-up.

Actual results:

The uninstall pop-up doesn't have the "Message from Operator developer" section.

Expected results:

The uninstall instructions should show up under "Message from Operator developer".

Additional info:

The two annotations seemed to be linked here, https://github.com/openshift/console/blob/3e0bb0928ce09030bc3340c9639b2a1df9e0a007/frontend/packages/operator-lifecycle-manager/src/components/modals/uninstall-operator-modal.tsx#LL395C10-L395C26

https://github.com/openshift/console/pull/12840

Bug OCPBUGS-15154: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-17346: Avoid recreating prometheus Statefulset during 4.13 > 4.14 upgrades

View the Description View the linked PRs

Description of problem:

Due to https://github.com/openshift/cluster-monitoring-operator/pull/1986, the prometheus-operator was instructed to inject the app.kubernetes.io/part-of: openshift-monitoring label (via its --labels option) to resources it creates.

The label is also

Version-Release number of selected component (if applicable):

4.14

How reproducible:

upgrade to a 4.14 version with the commit https://github.com/openshift/cluster-monitoring-operator/pull/1986

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

We should avoid recreating the statefulset as this leads to downtime (for Prometheus, both Pods are recreated)

Additional info:

See why prometheus-operator doesn't use cascade=orphan for deletion (to keep the Pods around and avoid downtime)
Maybe other statefulsets are recreated as well (alertmanager etc.), maybe removing the --labels option will fix it for all of them (they are all created by the operator)
See if we touched the matchLabels of other statefulsets outside the control of the prom operator
See if we can add an origin test to make sure Statefulsets (maybe other resources as well are not recreated), can we really live with that? (what if we really want to change an immutable field), maybe in origin we can specify upgrade versions??

https://github.com/openshift/cluster-monitoring-operator/pull/2066

Bug OCPBUGS-19479: CNV regression with recent Kubernetes rebase - device plugin

View the Description View the linked PRs

Description of problem:

Pods are being terminated on Kubelet restart if they consume any device.

In case of CNV this Pods are carrying VMs and the assuption is that Kubelet will not terminate the Pod in this case.

Version-Release number of selected component (if applicable):

4.14 / 4.13.z / 4.12.z

How reproducible:

This should be reproducable with any device plugin as far as goes my understanding

Steps to Reproduce:

1. Create Pod requesting device plugin
2. Restart Kubelet
3.

Actual results:

Admission error -> Pod terminates

Expected results:

No error -> Existing & Running Pods will continue running after Kubelet restart

Additional info:

The culprit seems to be https://github.com/kubernetes/kubernetes/pull/116376

https://github.com/openshift/kubernetes/pull/1709

Bug OCPBUGS-20245: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-etcd-operator/pull/1135

Bug OCPBUGS-6759: New master couldn't be created when update cpms on ASH

View the Description View the linked PRs

Description of problem:

Update cpms vmSize on ASH, got error "The value 1024 of parameter 'osDisk.diskSizeGB' is out of range. The value must be between '1' and '1023', inclusive." Target="osDisk.diskSizeGB"when provisioning new control plane node, change diskSizeGB to 1023, new nodes are provisioned. But for fresh install, the default diskSizeGB is 1024 for master.

Version-Release number of selected component (if applicable):

4.13.0-0.nightly-2023-01-27-165107

How reproducible:

Always

Steps to Reproduce:

1. Update cpms vmSize to Standard_DS3_v2
2. Check new machine state
$ oc get machine  
NAME                                PHASE     TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running   Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-hhfzl-0        Failed                                      24s
jima28b-r9zht-master-qtb9j-0        Running   Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-tprc7-2        Running   Standard_DS5_v2   mtcazs          11h

$ oc get machine jima28b-r9zht-master-hhfzl-0 -o yaml
  errorMessage: 'failed to reconcile machine "jima28b-r9zht-master-hhfzl-0": failed
    to create vm jima28b-r9zht-master-hhfzl-0: failure sending request for machine
    jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
    Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter"
    Message="The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range. The
    value must be between ''1'' and ''1023'', inclusive." Target="osDisk.diskSizeGB"'
  errorReason: InvalidConfiguration
  lastUpdated: "2023-01-29T02:35:13Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-01-29T02:35:13Z"
      message: 'failed to create vm jima28b-r9zht-master-hhfzl-0: failure sending
        request for machine jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate:
        Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter"
        Message="The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range.
        The value must be between ''1'' and ''1023'', inclusive." Target="osDisk.diskSizeGB"'
      reason: MachineCreationFailed
      status: "False"
      type: MachineCreated
    metadata: {}
3. Checke logs
$ oc logs -f machine-api-controllers-84444d49f-mlldl -c machine-controller
I0129 02:35:15.047784       1 recorder.go:103] events "msg"="InvalidConfiguration: failed to reconcile machine \"jima28b-r9zht-master-hhfzl-0\": failed to create vm jima28b-r9zht-master-hhfzl-0: failure sending request for machine jima28b-r9zht-master-hhfzl-0: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code=\"InvalidParameter\" Message=\"The value 1024 of parameter 'osDisk.diskSizeGB' is out of range. The value must be between '1' and '1023', inclusive.\" Target=\"osDisk.diskSizeGB\"" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"jima28b-r9zht-master-hhfzl-0","uid":"6cb07114-41a6-40bc-8e83-d9f27931bc8c","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"451889"} "reason"="FailedCreate" "type"="Warning"

 $ oc logs -f control-plane-machine-set-operator-69b756df4f-skv4x E0129 02:35:13.282358       1 controller.go:818]  "msg"="Observed failed replacement control plane machines" "error"="found replacement control plane machines in an error state, the following machines(s) are currently reporting an error: jima28b-r9zht-master-hhfzl-0" "controller"="controlplanemachineset" "failedReplacements"="jima28b-r9zht-master-hhfzl-0" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="a988d699-8ddc-4880-9930-0db64ca51653" I0129 02:35:13.282380       1 controller.go:264]  "msg"="Cluster state is degraded. The control plane machine set will not take any action until issues have been resolved." "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="a988d699-8ddc-4880-9930-0db64ca51653" 
4. Change diskSizeGB to 1023, new machine Provisioned.
            osDisk:
              diskSettings: {}
              diskSizeGB: 1023

$ oc get machine                  
NAME                                PHASE      TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running    Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-hhfzl-0        Deleting                                     7m1s
jima28b-r9zht-master-qtb9j-0        Running    Standard_DS5_v2   mtcazs          12h
jima28b-r9zht-master-tprc7-2        Running    Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-worker-mtcazs-p8d79   Running    Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-x5gvh   Running    Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-xmdvw   Running    Standard_DS3_v2   mtcazs          18h
$ oc get machine        
NAME                                PHASE         TYPE              REGION   ZONE   AGE
jima28b-r9zht-master-h7g67-1        Running       Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-qtb9j-0        Running       Standard_DS5_v2   mtcazs          12h
jima28b-r9zht-master-tprc7-2        Running       Standard_DS5_v2   mtcazs          11h
jima28b-r9zht-master-vqd7r-0        Provisioned   Standard_DS3_v2   mtcazs          16s
jima28b-r9zht-worker-mtcazs-p8d79   Running       Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-x5gvh   Running       Standard_DS3_v2   mtcazs          18h
jima28b-r9zht-worker-mtcazs-xmdvw   Running       Standard_DS3_v2   mtcazs          18h

Actual results:

For fresh install, the default diskSizeGB is 1024 for master. But update cpms vmSize, new master was created failed, report error "The value 1024 of parameter ''osDisk.diskSizeGB'' is out of range.  The value must be between ''1'' and ''1023'', inclusive"
When changing diskSizeGB to 1023, new machine got Provisioned.

Expected results:

New master could be created when change vmtype, and don't need update diskSizeGB to 1023.

Additional info:

Minimum recommendation for control plane nodes is 1024 GB
https://docs.openshift.com/container-platform/4.12/installing/installing_azure_stack_hub/installing-azure-stack-hub-network-customizations.html#installation-azure-stack-hub-config-yaml_installing-azure-stack-hub-network-customizations

https://github.com/openshift/installer/pull/7100

Bug OCPBUGS-13808: Console SDK components should be using GroupVersionKind object

View the Description View the linked PRs

Description of problem:

Some of the components in Console Dynamic Plugin SDK take `GroupVersionKind` type, which is string for the `groupVersionKind` prop, but instead they should be using new `K8sGroupVersionKind` object.

Version-Release number of selected component (if applicable):

How reproducible:

always

Bug OCPBUGS-14668: visiting Configurations page returns error Cannot read properties of undefined (reading 'apiGroup')

View the Description View the linked PRs

Description of problem:

visiting global configurations page will return error after 'Red Hat OpenShift Serverless' is installed, the error persist even operator is uninstalled

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-06-06-212044

How reproducible:

Always

Steps to Reproduce:

1. Subscribe 'Red Hat OpenShift Serverless' from OperatorHub, wait for the operator to be successfully installed
2. Visit Administration -> Cluster Settings -> Configurations tab

Actual results:

react_devtools_backend_compact.js:2367 unhandled promise rejection: TypeError: Cannot read properties of undefined (reading 'apiGroup') 
    at r (main-chunk-e70ea3b3d562514df486.min.js:1:1)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
    at Array.map (<anonymous>)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
overrideMethod @ react_devtools_backend_compact.js:2367
window.onunhandledrejection @ main-chunk-e70ea3b3d562514df486.min.js:1

main-chunk-e70ea3b3d562514df486.min.js:1 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'apiGroup')
    at r (main-chunk-e70ea3b3d562514df486.min.js:1:1)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1
    at Array.map (<anonymous>)
    at main-chunk-e70ea3b3d562514df486.min.js:1:1

Expected results:

no errors

Additional info:

https://github.com/openshift/console/pull/12882

Bug OCPBUGS-18024: HCP Create NodePool AWS Render Does Not Specify InstanceType or Arch

View the Description View the linked PRs

Description of problem:

The HCP Create NodePool AWS Render command does not work correctly since it does not render a specification with the arch and instance type defined.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

No arch or instance type defined in specification.

Expected results:

Arch and instance type defined in specification.

Additional info:

https://github.com/openshift/hypershift/pull/2941

Bug OCPBUGS-24397: openshift-gcp-routes.sh exits prematurely, causing critical systemd service restarts

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20499~~. The following is the description of the original issue:
—
This test triggers failures shortly after node reboot. Of course the node isn't ready, it rebooted.

: [sig-node] nodes should not go unready after being upgraded and go unready only once

{ 1 nodes violated upgrade expectations: Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went unready multiple times: 2023-10-11T21:58:45Z, 2023-10-11T22:05:45Z Node ci-op-q38yw8yd-8aaeb-lsqxj-master-0 went ready multiple times: 2023-10-11T21:58:46Z, 2023-10-11T22:07:18Z }

Both of those times, the master-0 was rebooted or being rebooted.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-network-operator/2060/pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade/1712203703311667200

https://github.com/openshift/machine-config-operator/pull/4056

Story STOR-1334: Update CSO to read FeatureGates from FeatureGate.Status

View the Description View the linked PRs

OCP FeatureGate object gets a new status field, where the enabled feature gates are listed. We should use this new field instead of parsing FeatureGate.Spec.

This should be fully transparent to users, they still set FeatureGate.Spec and they should still observe that SharedResource CSI driver + operator is installed when they enable TechPreviewNoUpgrade feature set there.

Enhancement: https://github.com/openshift/cluster-storage-operator/pull/368

Bug OCPBUGS-20554: GCP SNO installation fails because redirect ipt doesn't take effect on SGW

View the Description View the linked PRs

I tried upgrading a 4.14 SNO cluster from one nightly image to another and, while on AWS the upgrade works fine, it fails on GCP.

Cluster Network Operator successfully upgrades ovn-kubernetes, but is stuck on cloud network config controller, which is on crash loop back off state because it receives a wrong IP address from the name server when trying to reach the API server. The node IP is actually 10.0.0.3 and the name server returns 10.0.0.2, which I suspect is the bootstrap node IP, but that's only my guess.

Some relevant logs:

$ oc get co network
network                                    4.14.0-0.nightly-2023-08-15-200133   True        True          False      86m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is not available (awaiting 1 nodes)

$ oc get pods -n openshift-ovn-kubernetes -o wide
NAME                                     READY   STATUS    RESTARTS       AGE   IP         NODE                                 NOMINATED NODE   READINESS GATES ovnkube-control-plane-844c8f76fb-q4tvp   2/2     Running   3              24m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none> ovnkube-node-24kb7                       10/10   Running   12 (13m ago)   25m   10.0.0.3   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc get pods -n openshift-cloud-network-config-controller -o wide
openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69               0/1     CrashLoopBackOff   15 (2m37s ago)   40m    10.128.0.141   ci-ln-rij2p1b-72292-xmzf4-master-0   <none>           <none>

$ oc logs -n openshift-cloud-network-config-controller          cloud-network-config-controller-d65ccbc5b-dnt69  W0816 11:06:00.666825       1 client_config.go:618] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work. F0816 11:06:30.673952       1 main.go:345] Error building controller runtime client: Get "https://api-int.ci-ln-rij2p1b-72292.gcp-2.ci.openshift.org:6443/api?timeout=32s": dial tcp 10.0.0.2:6443: i/o timeout

I also get 10.0.0.2 if I run a DNS query from the node itself or from a pod:

dig api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org
...
;; ANSWER SECTION:
api-int.ci-ln-zp7dbyt-72292.gcp-2.ci.openshift.org. 60 IN A 10.0.0.2

Version-Release number of selected component (if applicable):

4.14

How reproducible:

Always.

Steps to Reproduce:

1.on clusterbot: launch 4.14 gcp,single-node
2. on a terminal: oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-08-15-200133 --allow-explicit-upgrade --force

Actual results:

name server returns 10.0.0.2, so CNCC fails to reach the API server

Expected results:

name server should return 10.0.0.3

Must-gather: https://drive.google.com/file/d/1MDbsMgIQz7dE6e76z4ad95dwaxbSNrJM/view?usp=sharing

I'm assigning this bug first to the network edge team for a first pass. Please do reassign it if necessary.

https://github.com/openshift/machine-config-operator/pull/3973

Bug OCPBUGS-7559: Newly provisioned machines unable to join cluster

View the Description View the linked PRs

Description of problem:

When attempting to add nodes to a long-lived 4.12.3 cluster, net new nodes are not able to join the cluster. They are provisioned in the cloud provider (AWS), but never actually join as a node.

Version-Release number of selected component (if applicable):

4.12.3

How reproducible:

Consistent

Steps to Reproduce:

1. On a long lived cluster, add a new machineset

Actual results:

Machines reach "Provisioned" but don't join the cluster

Expected results:

Machines join cluster as nodes

Additional info:

https://github.com/openshift/machine-config-operator/pull/3585

Bug OCPBUGS-8007: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/sdn/pull/515

Bug OCPBUGS-11352: --external-cloud-volume-plugin for out-of tree providers

View the Description View the linked PRs

Due to removal of in-tree AWS provider https://github.com/kubernetes/kubernetes/pull/115838 we need to ensure that KCM is setting --external-cloud-volume-plugin flag accordingly, especially that the CSI migration was GA-ed in 4.12/1.25.

https://github.com/openshift/cluster-kube-controller-manager-operator/pull/721

Bug OCPBUGS-11550: `cluster-reader` role cannot access "k8s.ovn.org" API Group resources

View the Description View the linked PRs

Description of problem:

`cluster-reader` ClusterRole should have ["get", "list", "watch"] permissions for a number of privileged CRs, but lacks them for the API Group "k8s.ovn.org", which includes CRs such as EgressFirewalls, EgressIPs, etc.

Version-Release number of selected component (if applicable):

OCP 4.10 - 4.12 OVN

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster with OVN components, e.g. EgressFirewall
2. Check permissions of ClusterRole `cluster-reader`

Actual results:

No permissions for OVN resources

Expected results:

Get, list, and watch verb permissions for OVN resources

Additional info:

Looks like a similar bug was opened for "network-attachment-definitions" in OCPBUGS-6959 (whose closure is being contested).

https://github.com/openshift/cluster-network-operator/pull/1791

Bug OCPBUGS-18980: spec.containers.image is empty when use 'oc new-app' created deploy when build/deploymentconfig are not installed

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-18498~~. The following is the description of the original issue:
—
Description of problem:

If not installed capability operator build and deploymentconfig, when use `oc new-app registry.redhat.io/<namespace>/<image>:<tag>` , the created deployment emptied spec.containers[0].image. The deploy will fail to start pod.

Version-Release number of selected component (if applicable):

oc version
Client Version: 4.14.0-0.nightly-2023-08-22-221456
Kustomize Version: v5.0.1
Server Version: 4.14.0-0.nightly-2023-09-02-132842
Kubernetes Version: v1.27.4+2c83a9f

How reproducible:

Always

Steps to Reproduce:

1. Installed cluster without build/deploymentconfig function
Set "baselineCapabilitySet: None" in install-config
2.Create a deploy using 'new-app' cmd
oc new-app registry.redhat.io/ubi8/httpd-24:latest
3.

Actual results:

2.
$oc new-app registry.redhat.io/ubi8/httpd-24:latest
--> Found container image c412709 (11 days old) from registry.redhat.io for "registry.redhat.io/ubi8/httpd-24:latest"    Apache httpd 2.4
    ----------------
    Apache httpd 2.4 available as container, is a powerful, efficient, and extensible web server. Apache supports a variety of features, many implemented as compiled modules which extend the core functionality. These can range from server-side programming language support to authentication schemes. Virtual hosting allows one Apache installation to serve many different Web sites.    Tags: builder, httpd, httpd-24    * An image stream tag will be created as "httpd-24:latest" that will track this image--> Creating resources ...
    imagestream.image.openshift.io "httpd-24" created
    deployment.apps "httpd-24" created
    service "httpd-24" created
--> Success
    Application is not exposed. You can expose services to the outside world by executing one or more of the commands below:
     'oc expose service/httpd-24'
    Run 'oc status' to view your app

3. oc get deploy -o yaml
 apiVersion: v1
items:
- apiVersion: apps/v1
  kind: Deployment
  metadata:
    annotations:
      deployment.kubernetes.io/revision: "1"
      image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"httpd-24:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"httpd-24\")].image"}]'
      openshift.io/generated-by: OpenShiftNewApp
    creationTimestamp: "2023-09-04T07:44:01Z"
    generation: 1
    labels:
      app: httpd-24
      app.kubernetes.io/component: httpd-24
      app.kubernetes.io/instance: httpd-24
    name: httpd-24
    namespace: wxg
    resourceVersion: "115441"
    uid: 909d0c4e-180c-4f88-8fb5-93c927839903
  spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
      matchLabels:
        deployment: httpd-24
    strategy:
      rollingUpdate:
        maxSurge: 25%
        maxUnavailable: 25%
      type: RollingUpdate
    template:
      metadata:
        annotations:
          openshift.io/generated-by: OpenShiftNewApp
        creationTimestamp: null
        labels:
          deployment: httpd-24
      spec:
        containers:
        - image: ' '
          imagePullPolicy: IfNotPresent
          name: httpd-24
          ports:
          - containerPort: 8080
            protocol: TCP
          - containerPort: 8443
            protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    conditions:
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Created new replica set "httpd-24-7f6b55cc85"
      reason: NewReplicaSetCreated
      status: "True"
      type: Progressing
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: Deployment does not have minimum availability.
      reason: MinimumReplicasUnavailable
      status: "False"
      type: Available
    - lastTransitionTime: "2023-09-04T07:44:01Z"
      lastUpdateTime: "2023-09-04T07:44:01Z"
      message: 'Pod "httpd-24-7f6b55cc85-pvvgt" is invalid: spec.containers[0].image:
        Invalid value: " ": must not have leading or trailing whitespace'
      reason: FailedCreate
      status: "True"
      type: ReplicaFailure
    observedGeneration: 1
    unavailableReplicas: 1
kind: List
metadata:

Expected results:

Should set spec.containers[0].image to registry.redhat.io/ubi8/httpd-24:latest

Additional info:

Bug OCPBUGS-19362: Hide the DeploymentConfig option in the User Preferences

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19314~~. The following is the description of the original issue:
—

Description

As a user, I dont want to see the option of "DeploymentConfigs" in the User settings, when I have not installed the same in the cluster.

Acceptance Criteria

Hide the DeploymentConfig option as the Default Resource Type when its not installed

Additional Details:

https://github.com/openshift/console/pull/13164

Bug OCPBUGS-2765: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

Bug OCPBUGS-7446: Show type of sample on the samples view

View the Description View the linked PRs

Description

As a user, I would like to see the type of technology used by the samples on the samples view similar to the all services view.

On the samples view:

It is showing different types of samples, e.g. devfile, helm and all showing as .NET. It is difficult for user to decide which .Net entry to select on the list. We'll need something like the all service view where it shows the type of technology on the top right of each card for users to differentiate between the entries:

Acceptance Criteria

Add visible label as the all services view on each card to show the technology used by the sample on the samples view.

Additional Details:

https://github.com/openshift/console/pull/12548

Bug OCPBUGS-25813: [OCP 4.14] VM stuck in terminating state after OCP node crash

View the Description View the linked PRs

Description of problem:

After a manual crash of a OCP node the OSPD VM running on the OCP node is stuck in terminating state

Version-Release number of selected component (if applicable):

OCP 4.12.15 
osp-director-operator.v1.3.0
kubevirt-hyperconverged-operator.v4.12.5

How reproducible:

Login to a OCP 4.12.15 Node running a VM 
Manually crash the master node.
After reboot the VM stay in terminating state

Steps to Reproduce:

    1. ssh core@masterX 
    2. sudo su
    3. echo c > /proc/sysrq-trigger

Actual results:

After reboot the VM stay in terminating state


$ omc get node|sed -e 's/modl4osp03ctl/model/g' | sed -e 's/telecom.tcnz.net/aaa.bbb.ccc/g'
NAME                               STATUS   ROLES                         AGE   VERSION
model01.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model02.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08
model03.aaa.bbb.ccc   Ready    control-plane,master,worker   91d   v1.25.8+37a9a08


$ omc get pod -n openstack 
NAME                                                        READY   STATUS         RESTARTS   AGE
openstack-provision-server-7b79fcc4bd-x8kkz                 2/2     Running        0          8h
openstackclient                                             1/1     Running        0          7h
osp-director-operator-controller-manager-5896b5766b-sc7vm   2/2     Running        0          8h
osp-director-operator-index-qxxvw                           1/1     Running        0          8h
virt-launcher-controller-0-9xpj7                            1/1     Running        0          20d
virt-launcher-controller-1-5hj9x                            1/1     Running        0          20d
virt-launcher-controller-2-vhd69                            0/1     NodeAffinity   0          43d

$ omc describe  pod virt-launcher-controller-2-vhd69 |grep Status:
Status:                    Terminating (lasts 37h)

$ xsos sosreport-xxxx/|grep time
...
  Boot time: Wed Nov 22 01:44:11 AM UTC 2023
  Uptime:    8:27,  0 users

Expected results:

VM restart automatically OR does not stay in Terminating state

Additional info:

The issue has been seen two time.

First time, a crash of the kernel occured and we had the associated VM on the node in terminating state

Second time we try to reproduce the issue by crashing manually the kernel and we got the same result.
The VM running on the OCP node stay in terminating state

https://github.com/openshift/kubernetes/pull/1830

Feature Request RFE-2782: AWS Local Zone Support for OCP UPI/IPI

View the Description View the linked PRs

AWS Local Zone Support for OCP UPI/IPI

Current AWS Based OCP deployment models do not address Local Zones which offer lower latency and geo-proximity to OCP Cluster Consumers.

OCP Install Support for AWS Local Zones will address Customer Segments where low latency and data locality requirements enforce as deal breaker/show-stopper for our sales teams engagements.

https://github.com/openshift/installer/pull/6371

Bug MGMT-13925: Ironic agent is trying to access inspection API using the cluster VIP instead of the ip of the metal3 pod

View the Description View the linked PRs

Description of the problem:

While scale testing ACM 2.8, sometimes 0 of the SNOs are discovered. Upon review, the agent on the SNOs is attempting to return the inspection data to the API VIP ip address instead of the ip address of the metal3 pod (which is the node hosting the metal3 pod). Presumbly the times where the agents were discovered, the VIP API address happened to be on the same node as the metal3 pod.

How reproducible:

Roughly it should be 66% of the time you could encounter this with a 3 node cluster.

Steps to reproduce:

1.

2.

3.

Actual results:

Ironic agents attempting to access "fc00:1004::3" which is the API vip address

2023-03-12 17:52:51.441 1 CRITICAL ironic-python-agent [-] Unhandled error: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                    
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 169, in _new_conn                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn = connection.create_connection(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 96, in create_connection                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise err
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/connection.py", line 86, in create_connection                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     sock.connect(sa)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 253, in connect                                                                      
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     socket_checkerr(fd)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/eventlet/greenio/base.py", line 51, in socket_checkerr                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise socket.error(err, errno.errorcode[err])
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent ConnectionRefusedError: [Errno 111] ECONNREFUSED
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 699, in urlopen                                                                     
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     httplib_response = self._make_request(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 382, in _make_request                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     self._validate_conn(conn)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn.connect()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 353, in connect                                                                         
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     conn = self._new_conn()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connection.py", line 181, in _new_conn                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise NewConnectionError(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 439, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = conn.urlopen(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/connectionpool.py", line 755, in urlopen                                                                     
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     retries = retries.increment(
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/urllib3/util/retry.py", line 574, in increment                                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise MaxRetryError(_pool, url, error or ResponseError(cause))                                                                                            
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent During handling of the above exception, another exception occurred:                                                                                           
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent Traceback (most recent call last):
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/bin/ironic-python-agent", line 10, in <module>                                                                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     sys.exit(run())
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/cmd/agent.py", line 50, in run                                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     agent.IronicPythonAgent(CONF.api_url,
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/agent.py", line 471, in run                                                                      
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     uuid = inspector.inspect()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 106, in inspect                                                              
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = call_inspector(data, failures)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 145, in call_inspector                                                       
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = _post_to_inspector()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 329, in wrapped_f                                                                        
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return self.call(f, *args, **kw)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 409, in call                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     do = self.iter(retry_state=retry_state)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 368, in iter                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise retry_exc.reraise()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 186, in reraise                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise self.last_attempt.result()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 439, in result                                                                                
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return self.__get_result()
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 391, in __get_result                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise self._exception
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/tenacity/__init__.py", line 412, in call                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     result = fn(*args, **kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/ironic_python_agent/inspector.py", line 142, in _post_to_inspector                                                   
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return requests.post(CONF.inspection_callback_url, data=data,                                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/api.py", line 119, in post                                                                                  
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return request('post', url, data=data, json=json, **kwargs)                                                                                               
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/api.py", line 61, in request                                                                                
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     return session.request(method=method, url=url, **kwargs)                                                                                                  
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 542, in request                                                                          
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     resp = self.send(prep, **send_kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/sessions.py", line 655, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     r = adapter.send(request, **kwargs)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent   File "/usr/lib/python3.9/site-packages/requests/adapters.py", line 516, in send                                                                             
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent     raise ConnectionError(e, request=request)
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent requests.exceptions.ConnectionError: HTTPSConnectionPool(host='fc00:1004::3', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f94354114c0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))                                                                            
2023-03-12 17:52:51.441 1 ERROR ironic-python-agent

You can see the metal3 pod node and ip address:

# oc get po -n openshift-machine-api metal3-5cc95d74d8-lqd9x -o wide
NAME                      READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
metal3-5cc95d74d8-lqd9x   5/5     Running   0          2d16h   fc00:1004::7   e27-h05-000-r650   <none>           <none>

The addresses on the e27-h05-000-r650 node:

[root@e27-h05-000-r650 ~]# ip a | grep "fc00"
    inet6 fc00:1004::4/128 scope global nodad deprecated
    inet6 fc00:1004::7/64 scope global noprefixroute

You can see the api VIP is actually on this host:

[root@e27-h03-000-r650 ~]# ip a | grep "fc00"
    inet6 fc00:1004::3/128 scope global nodad deprecated 
    inet6 fc00:1004::6/64 scope global noprefixroute

Expected results:

Versions:

Hub and SNO OCP 4.12.2

ACM - 2.8.0-DOWNSTREAM-2023-02-28-23-06-27

https://github.com/openshift/assisted-service/pull/5041

Bug MGMT-14803: Cluster update won't fail on incompatible OLM operator dependency

View the Description View the linked PRs

Description of the problem:

EnsureOperatorPrerequisite is using the cluster CPU architecture while on multi arch cluster the CPU architecture will always be multi. On update clusterm EnsureOperatorPrerequisite will not prevent the cluster from being updated but will fail on the next update request.

Steps to reproduce:

1. Register multi arch cluster (P or Z)

2. Update cluster with ODF operator

3. Update any cluster field

Actual results:

Cluster failed to update on the second time

Expected results:

Not to fail

https://github.com/openshift/assisted-service/pull/5264

Bug OCPBUGS-7690: [azure] Public DNS records are leftover without any error when destroying cluster with limited permission

View the Description View the linked PRs

Description of problem:

Following doc[1] to assign custom role with minimum permission for destroying cluster to installer Service Principle.

As read permission misses on public dns zone and private dns zone in that doc for destroying IPI cluster, public dns records have no permission to be removed.

But installer destroy is completed without any warning message.
$ ./openshift-install destroy cluster --dir ipi --log-level debug
DEBUG OpenShift Installer 4.13.0-0.nightly-2023-02-16-120330 
DEBUG Built from commit c0bf49ca9e83fd00dfdfbbdddd47fbe6b5cdd510 
INFO Credentials loaded from file "/home/fedora/.azure/osServicePrincipal.json" 
DEBUG deleting public records                      
DEBUG deleting resource group                      
INFO deleted                                       resource group=jima-ipi-role-l7qgz-rg
DEBUG deleting application registrations           
DEBUG Purging asset "Metadata" from disk           
DEBUG Purging asset "Master Ignition Customization Check" from disk 
DEBUG Purging asset "Worker Ignition Customization Check" from disk 
DEBUG Purging asset "Terraform Variables" from disk 
DEBUG Purging asset "Kubeconfig Admin Client" from disk 
DEBUG Purging asset "Kubeadmin Password" from disk 
DEBUG Purging asset "Certificate (journal-gatewayd)" from disk 
DEBUG Purging asset "Cluster" from disk            
INFO Time elapsed: 6m16s                          
INFO Uninstallation complete!                     

$ az network dns record-set a list --resource-group os4-common --zone-name qe.azure.devcluster.openshift.com  -o table| grep jima-ipi-role
*.apps.jima-ipi-role                                       os4-common       30     A       kubernetes.io_cluster.jima-ipi-role-l7qgz="owned"

$ az network dns record-set cname list --resource-group os4-common --zone-name qe.azure.devcluster.openshift.com  -o table| grep jima-ipi-role
api.jima-ipi-role                 os4-common       300    CNAME   kubernetes.io_cluster.jima-ipi-role-l7qgz="owned"

[1] https://docs.google.com/document/d/1iEs7T09Opj0iMXvpKeSatsAyPoda_gWQvFKQuWA3QdM/edit#

Version-Release number of selected component (if applicable):

4.13 nightly build

How reproducible:

always

Steps to Reproduce:

1. Create custom role with limited permission for destroying cluster, without read permission on public dns zone and private dns zone.
2. Assign the custom role to Service Principal
3. Use this SP to destroy cluster

Actual results:

Although some permissions missed, installer destroy cluster completed without any warning.

Expected results:

Installer should have some warning message that indicate resources leftover with some specific reason, so that user can process further.

Additional info:

https://github.com/openshift/installer/pull/7433

Bug OCPBUGS-9907: Alerts display incorrect source when adding external alert sources

View the Description View the linked PRs

Description of problem:

The alerts table displays incorrect values (Prometheus) in the source column

Version-Release number of selected component (if applicable):

4.13

How reproducible:

Always

Steps to Reproduce:

1. Install LokiOperator, Cluster Logging operator and enable the logging view plugin with the alerts feature toggle enabled
2. Add a log-based alert
3. Check the alerts table source in the observe -> alerts section

Actual results:

Incorrect "Prometheus" value is displayed for non log-based alerts

Expected results:

"Platform" or "User" value is displayed for non log-based alerts

Additional info:

https://github.com/openshift/console/pull/12632

Bug OCPBUGS-10509: Sync "Debug in Terminal" feature with 3.x pods in web console

View the Description View the linked PRs

Description of problem:

Sync "Debug in Terminal" feature with 3.x pods in web console
The types of pods that enable the "Debug in terminal" feature should be in alignment with those in v3.11. See code here: https://github.com/openshift/origin-web-console/blob/c37982397087036321312172282e139da378eff2/app/scripts/directives/resources.js#L33-L53

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/12657

Bug OCPBUGS-12561: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/56

Bug OCPBUGS-15947: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/baremetal-runtimecfg/pull/266

Bug OCPBUGS-16373: arping routinely errors out in RHEL 9

View the Description View the linked PRs

Description of the problem:

In RHEL 8, the arping command (from iputils-s20180629) only returns 1 when used for duplicate address detection. In all other modes it returns 0 on success; 2 or -1 on error.

In RHEL 9, the arping command (from iputils 20210202) also returns 1 in other modes, essentially at random. (There is some kind of theory behind it, but even after multiple fixes to the logic it does not remotely work in any consistent way.)

How reproducible:

60-100% for individual arping commands

100% installation failure

Steps to reproduce:

Build the agent container using RHEL 9 as the base image
arping -c 10 -w 5 -I enp2s0 192.168.111.1; echo $?

Actual results:

arping returns 1

journal on the discovery ISO shows:

Jul 19 04:35:38 master-0 next_step_runne[3624]: time="19-07-2023 04:35:38" level=error msg="Error while processing 'arping' command" file="ipv4_arping_checker.go:28" error="exit status 1"

all hosts are marked invalid and install fails.

Expected results:

ideally arping returns 0

failing that, we should treat both 0 and 1 as success as previous versions of arping effectively did.

https://github.com/openshift/assisted-installer-agent/pull/576

Bug OCPBUGS-20929: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/csi-external-resizer/pull/147

Bug OCPBUGS-23286: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/kubernetes/pull/1801

Bug OCPBUGS-11439: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2381

Bug OCPBUGS-21092: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cluster-capi-operator/pull/134

Bug OCPBUGS-9182: Console doesn't honor kubectl.kubernetes.io/default-container annotation

View the Description View the linked PRs

The `kubectl.kubernetes.io/default-container` annotation can be set on a pod to specify the default container for logs and terminal. The console doesn't honor the annotation. For example:

https://github.com/openshift/cluster-kube-apiserver-operator/blob/master/bindata/assets/kube-apiserver/pod.yaml#L7

https://github.com/openshift/console/pull/13098

Bug OCPBUGS-13219: OVN image pre-puller pod uses `imagePullPolicy: Always` and blocks upgrade when there is no registry

View the Description View the linked PRs

Description of problem:

OVN image pre-puller blocks upgrades in environments where the images have already been pulled but the registry server is not available.

Version-Release number of selected component (if applicable):

4.12

How reproducible:

Always

Steps to Reproduce:

1. Create a cluster in a disconnected environment.

2. Manually pre-pull all the images required for the upgrade. For example, get the list of images needed:

# oc adm release info quay.io/openshift-release-dev/ocp-release:4.12.10-x86_64 -o json > release-info.json

And then pull them in all the nodes of the cluster:

# crio pull $(cat release-info.json | jq -r '.references.spec.tags[].from.name')

3. Stop or somehow make the registry unreachable, then trigger the upgrade.

Actual results:

The upgrade blocks with the following error reported by the cluster version operator:

# oc get clusterversion; oc get co network
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.10   True        True          62m     Working towards 4.12.11: 483 of 830 done (58% complete), waiting on network
NAME      VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
network   4.12.10   True        True          False      133m    DaemonSet "/openshift-ovn-kubernetes/ovnkube-upgrades-prepuller" is not available (awaiting 1 nodes)

The reason for that is that the `ovnkube-upgrades-prepuller-...` pod uses `imagePullPolicy: Always` and that fails if there is no registry, even if the image has already been pulled:

# oc get pods -n openshift-ovn-kubernetes ovnkube-upgrades-prepuller-5s2cn
NAME                               READY   STATUS             RESTARTS   AGE
ovnkube-upgrades-prepuller-5s2cn   0/1     ImagePullBackOff   0          44m

# oc get events -n openshift-ovn-kubernetes --field-selector involvedObject.kind=Pod,involvedObject.name=ovnkube-upgrades-prepuller-5s2cn,reason=Failed
LAST SEEN   TYPE      REASON   OBJECT                                 MESSAGE
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Failed to pull image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071": rpc error: code = Unknown desc = (Mirrors also failed: [server.home.arpa:8443/openshift/release@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071: pinging container registry server.home.arpa:8443: Get "https://server.home.arpa:8443/v2/": dial tcp 192.168.100.1:8443: connect: connection refused]): quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071: pinging container registry quay.io: Get "https://quay.io/v2/": dial tcp: lookup quay.io on 192.168.100.1:53: server misbehaving
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Error: ErrImagePull
43m         Warning   Failed   pod/ovnkube-upgrades-prepuller-5s2cn   Error: ImagePullBackOff

# oc get pod -n openshift-ovn-kubernetes ovnkube-upgrades-prepuller-5s2cn -o json | jq -r '.spec.containers[] | .imagePullPolicy + " " + .image'
Always quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:52f189797a83cae8769f1a4dc6dfd46d586914575ee99de6566fc23c77282071

Expected results:

The upgrade should not block.

Additional info:

We detected this in a situation where we want to be able to perform upgrades in a disconnected environment and without the registry server running. See ~~MGMT-13733~~ for details.

https://github.com/openshift/cluster-network-operator/pull/1803

Bug OCPBUGS-19337: Unhide the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-19311~~. The following is the description of the original issue:
—

Description

As a user, I would like to use the Import from Git form even if I don't have BC installed in my cluster, but I have installed the Pipelines operator.

Acceptance Criteria

Show the Import From Git Tab on the Add page if Pipelines Operator is installed and BuildConfig is not installed in the cluster

Additional Details:

https://github.com/openshift/console/pull/13160

Bug OCPBUGS-19797: HyperShift guest cluster does not have cloudcredentials instance

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-17906~~. The following is the description of the original issue:
—
Description of problem:

On Hypershift(Guest) cluster, EFS driver pod stuck at ContainerCreating state

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-08-11-055332

How reproducible:

Always

Steps to Reproduce:

1. Create Hypershift cluster.    
Flexy template: aos-4_14/ipi-on-aws/versioned-installer-ovn-hypershift-ci

2. Try to install EFS operator and driver from yaml file/web console as mentioned in below steps.  
a) Create iam role from ccoctl tool and will get ROLE ARN value from the output   
b) Install EFS operator using the above ROLE ARN value.   
c) Check EFS operator, node, controller pods are up and running  

// og-sub-hcp.yaml
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-cluster-csi-drivers-
  namespace: openshift-cluster-csi-drivers
spec:
  namespaces:
  - ""
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: aws-efs-csi-driver-operator
  namespace: openshift-cluster-csi-drivers
spec:
    channel: stable
    name: aws-efs-csi-driver-operator
    source: qe-app-registry
    sourceNamespace: openshift-marketplace
    config:
      env:
      - name: ROLEARN
        value: arn:aws:iam::301721915996:role/hypershift-ci-16666-openshift-cluster-csi-drivers-aws-efs-cloud-

// driver.yaml
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  logLevel: TraceAll
  managementState: Managed
  operatorLogLevel: TraceAll

Actual results:

aws-efs-csi-driver-controller-699664644f-dkfdk   0/4     ContainerCreating   0          87m

Expected results:

EFS controller pods should be up and running

Additional info:

oc -n openshift-cluster-csi-drivers logs aws-efs-csi-driver-operator-6758c5dc46-b75hb

E0821 08:51:25.160599       1 base_controller.go:266] "AWSEFSDriverCredentialsRequestController" controller failed to sync "key", err: cloudcredential.operator.openshift.io "cluster" not found

Discussion: https://redhat-internal.slack.com/archives/GK0DA0JR5/p1692606247221239
Installation steps epic: https://issues.redhat.com/browse/STOR-1421

https://github.com/openshift/hypershift/pull/3053

Story OCPCLOUD-1990: ControlPlaneMachineSet: Update supported platforms in docs

View the Description View the linked PRs

Background

Update the CPMS docs to reflect the newly supported flavours for the upcoming 4.13 release.

Steps

Create a PR to update the docs

Stakeholders

Cloud Team

Definition of Done

PR merged{}

https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/172

Bug OCPBUGS-21926: Azure CCM unable to manage Load Balancer in Azure Managed Identity Installs

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21745~~. The following is the description of the original issue:
—
Description of problem:

Upon installing 4.14.0-rc.6 in a cluster with private load balancer publishing and existing vnets Service type LoadBalancers lack permissions necessary to sync.

Version-Release number of selected component (if applicable):

4.14.0-rc.6

How reproducible:

Seemingly 100%

Steps to Reproduce:

1. Install w/ azure Managed Identity into an existing vnet with private LB publishing
2.
3.

Actual results:

                One or more other status conditions indicate a degraded state: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: Retriable: false, RetryAfter: 0s, HTTPStatusCode: 403, RawError: {"error":{"code":"AuthorizationFailed","message":"The client '194d5669-cb47-4199-a673-4b32a4a110be' with object id '194d5669-cb47-4199-a673-4b32a4a110be' does not have authorization to perform action 'Microsoft.Network/virtualNetworks/subnets/read' over scope '/subscriptions/14b86a40-8d8f-4e69-abaf-42cbb0b8a331/resourceGroups/net/providers/Microsoft.Network/virtualNetworks/rnd-we-net/subnets/paas1' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

Operators dependent on Ingress are failing as well.
authentication                             4.14.0-rc.6   False       False         True       149m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get https://oauth-openshift.apps.cnb10161.rnd.westeurope.example.com/healthz: dial tcp: lookup oauth-openshift.apps.cnb10161.rnd.westeurope.example.com on 10.224.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.14.0-rc.6   False       True          False      142m    DeploymentAvailable: 0 replicas available for console deployment...

Expected results:

Successful install

Additional info:

The client ID in the error correspond to “openshift-cloud-controller-manager-azure-cloud-credentials” which indeed when checking its Azure managed identity only has access to cluster RG and not the network RG.

Additionally, they note that this permission is granted to the MAPI roles just not the CCM roles.

https://github.com/openshift/cloud-credential-operator/pull/608

Bug OCPBUGS-9931: Enable node healthz server for ovnk in CNO

View the Description View the linked PRs

Node healthz server was added in 4.13 with https://github.com/openshift/ovn-kubernetes/commit/c8489e3ff9c321e77f265dc9d484ed2549df4a6b and https://github.com/openshift/ovn-kubernetes/commit/9a836e3a547f3464d433ce8b9eef336624d51858. We need to configure it by default on 0.0.0.0:10256 on CNO for ovnk, just like we do for sdn.

https://github.com/openshift/cluster-network-operator/pull/1715

Bug MGMT-15128: Nutanix 4.14 fail to install due to CVO deployment failure

View the Description View the linked PRs

Description of the problem:

CVO 4.14 failed to install when Nutanix platform provider is selected.

{
"cluster_id": "c8359d4e-141b-45ff-9979-d49dd679d56b",
"name": "cvo",
"operator_type": "builtin",
"status": "failed",
"status_updated_at": "2023-06-29T07:40:47.855Z",
"timeout_seconds": 3600,
"version": "4.14.0-0.nightly-2023-06-27-233015"
}

e.g https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-assisted-test-infra-master-e2e-nutanix-assisted-periodic/1674303871989583872

How reproducible:

Steps to reproduce:

1.

2.

3.

Actual results:

Expected results:

Bug OCPBUGS-11796: Allow installer to use existing Azure NSG during OpenShift IPI install

View the Description View the linked PRs

Description of problem:

In an install where users bring their networks they also bring their own NSGs. However, the installer still creates NSG. In Azure environments using the rule [1] below, users are prohibited from installing cluster, as the apiserver_in rule has the rule set as 0.0.0.0[2]. Having a rule in place where the users could define this before install would allow them to set this connectivity without having the inbound access 



[1] - Rule: Network Security Groups shall not allow rule with 0.0.0.0/Any Source/Destination IP Addresses - Custom Deny

[2] - https://github.com/openshift/installer/blob/master/data/data/azure/vnet/nsg.tf#L31

https://github.com/openshift/installer/pull/7094

Bug OCPBUGS-20249: Hosted clusters default KAS PSA config should be consistent with OCP

View the Description View the linked PRs

Description of problem:

[Hypershift] default KAS PSA config should be consistent with OCP 
 enforce: privileged

Version-Release number of selected component (if applicable):

Cluster version is 4.14.0-0.nightly-2023-10-08-220853

How reproducible:

Always

Steps to Reproduce:

1. Install OCP cluster and hypershift operator
2. Create hosted cluster
3. Check the default kas config of the hosted cluster

Actual results:

The hosted cluster default kas PSA config enforce is 'restricted'
$ jq '.admission.pluginConfig.PodSecurity' < `oc extract cm/kas-config -n clusters-9cb7724d8bdd0c16a113 --confirm`
{
  "location": "",
  "configuration": {
    "kind": "PodSecurityConfiguration",
    "apiVersion": "pod-security.admission.config.k8s.io/v1beta1",
    "defaults": {
      "enforce": "restricted",
      "enforce-version": "latest",
      "audit": "restricted",
      "audit-version": "latest",
      "warn": "restricted",
      "warn-version": "latest"
    },
    "exemptions": {
      "usernames": [
        "system:serviceaccount:openshift-infra:build-controller"
      ]
    }
  }
}

Expected results:

The hosted cluster default kas PSA config enforce should be 'privileged' in

https://github.com/openshift/hypershift/blob/release-4.13/control-plane-operator/controllers/hostedcontrolplane/kas/config.go#L93

Additional info:

References: OCPBUGS-8710

https://github.com/openshift/hypershift/pull/3083

Bug OCPBUGS-13552: vSphere: failed to parse ovf: XML syntax error on line 1

View the Description View the linked PRs

This issue has been reported multiple times over the years with no resolution
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.13-e2e-vsphere-zones/1655633815252504576

kubeconfig received!
waiting for api to be available
level=error
level=error msg=Error: failed to parse ovf: failed to parse ovf: XML syntax error on line 1: illegal character code U+0000
level=error
level=error msg= with vsphereprivate_import_ova.import[0],
level=error msg= on main.tf line 70, in resource "vsphereprivate_import_ova" "import":
level=error msg= 70: resource "vsphereprivate_import_ova" "import" {
level=error
level=error
level=error msg=Error: failed to parse ovf: failed to parse ovf: XML syntax error on line 1: illegal character code U+0000

https://issues.redhat.com/browse/OCPQE-13219
https://issues.redhat.com/browse/TRT-741

https://github.com/openshift/installer/pull/7171

Bug OCPBUGS-15805: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/cloud-network-config-controller/pull/115

Bug OCPBUGS-10884: mpath device type missing in LocalVolumeDiscovery CR

View the Description View the linked PRs

Description of problem:

When ODF StorageSystem CR gets created through the Wizard, the LocalVolumeDiscovery doesn't bring/show devices with mpath type

Version-Release number of selected component (if applicable):

OCP 4.11.31

How reproducible:

All the time

Steps to Reproduce:

1. Get OCP 4.11 running with the LSO and ODF operators
2. Configure and present mpath devices to nodes used for ODF
3. Use the ODF wizard to create a StorageSystem object
4. Inspect the LocalVolumeDiscovery results.

Actual results:

There are no devices of mpath type shown by the ODF wizard / LocalVolumeDiscovery CR

Expected results:

LocalVolumeDiscovery should discover mpath device type

Additional info:

LocalVolumeSet already works with mpath if you manually define them in .spec or  LocalVolume pointing to mpath devicePaths

Bug OCPBUGS-15906: ccoctl azure delete leaks role assignments

View the Description View the linked PRs

Description of problem:

Azure managed identity role assignments created using 'ccoctl azure' sub-commands are not cleaned up when running 'ccoctl azure delete'

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

100%

Steps to Reproduce:

1. Create Azure workload identity infrastructure using 'ccoctl azure create-all'
2. Delete Azure workload identity infrastructure using 'ccoctl azure delete'
3. Observe lingering role assignments in either the OIDC resource group if not deleted OR in the DNS Zone resource group if the OIDC resource group is deleted by providing '--delete-oidc-resource-group'.

Actual results:

Role assignments for managed identities are not deleted following 'ccoctl azure delete'

Expected results:

Role assignments for managed identities are deleted following 'ccoctl azure delete'

Additional info:

https://github.com/openshift/cloud-credential-operator/pull/564

Bug OCPBUGS-10916: Secret name variable get renders in Create Image pull secret alert

View the Description View the linked PRs

Description of problem:

Seeing `Secret {{newImageSecret}} was created.` string for the created Image pull secret alert in the Container image flow.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1. Navigate +Add page
2. Open the Container Image form
3. click on Create an Image pull secret link and create a secret

Actual results:

Secret {{newImageSecret}} was created. get render in the alert

Expected results:

Secret <-Secret name-> was created. should render in the alert

Additional info:

https://github.com/openshift/console/pull/12681

Bug OCPBUGS-20254: Upgrade from OpenShift 4.13 to 4.14 Leaves Network Operator Degraded

View the Description View the linked PRs

Description of problem:

After upgrading from OpenShift 4.13 to 4.14 with Kuryr network type, the network operator shows as Degraded and the cluster version reports that it's unable to apply the 4.14 update. The issue seems to be related to mtu settings, as indicated by the message: "Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]."

Version-Release number of selected component (if applicable):

Upgrading from 4.13 to 4.14
4.14.0-0.nightly-2023-09-15-233408
Kuryr network type
RHOS-17.1-RHEL-9-20230907.n.1

How reproducible:

Consistently reproducible on attempting to upgrade from 4.13 to 4.14.

Steps to Reproduce:

1.Install OpenShift version 4.13 on OpenStack. 
2.Initiate an upgrade to OpenShift version 4.14.

Actual results:

The network operator shows as Degraded with the message:

network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
 
Additionally, "oc get clusterversions" shows:

Unable to apply 4.14.0-0.nightly-2023-09-15-233408: wait has exceeded 40 minutes for these operators: network

Expected results:

The upgrade should complete successfully without any operator being degraded.

Additional info:

Some components remain at version 4.13.13 despite the upgrade attempt. Specifically, the dns, machine-config, and network operators are still at version 4.13.13. :

$ oc get co
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE                                                                                                         
authentication                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
baremetal                                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-controller-manager                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cloud-credential                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
cluster-autoscaler                         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
config-operator                            4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
console                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
control-plane-machine-set                  4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
csi-snapshot-controller                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
dns                                        4.13.13                              True        False         False      13h                                                                                                                     
etcd                                       4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
image-registry                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
ingress                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
insights                                   4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-apiserver                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-controller-manager                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-scheduler                             4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
kube-storage-version-migrator              4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-api                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-approver                           4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
machine-config                             4.13.13                              True        False         False      13h                                                                                                                     
marketplace                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
monitoring                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
network                                    4.13.13                              True        False         True       13h     Not applying unsafe configuration change: invalid configuration: [cannot change mtu for the Pods Network]. Use 'oc edit network.operator.openshift.io cluster' to undo the change.
node-tuning                                4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
openshift-apiserver                        4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-controller-manager               4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
openshift-samples                          4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
operator-lifecycle-manager                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-catalog         4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
operator-lifecycle-manager-packageserver   4.14.0-0.nightly-2023-09-15-233408   True        False         False      12h                                                                                                                     
service-ca                                 4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h                                                                                                                     
storage                                    4.14.0-0.nightly-2023-09-15-233408   True        False         False      13h

https://github.com/openshift/cluster-network-operator/pull/2046

Bug OCPBUGS-23150: mapi_current_pending_csr metric firing when non-mapi CSRs are present

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-21594~~. The following is the description of the original issue:
—
Description of problem:

The MAPI metric mapi_current_pending_csr fires even when there are no pending MAPI CSRs. However, there are non-MAPI CSRs present. It may not be appropriately scoping this metric to only it's CSRs.

Version-Release number of selected component (if applicable):

Observed in 4.11.25

How reproducible:

Consistent

Steps to Reproduce:

1. Install a component that uses CSRs (like ACM) but leave the CSRs in a pending state
2. Observe metric firing
3.

Actual results:

Metric is firing

Expected results:

Metric only fires if there are MAPI specific CSRs pending

Additional info:

This impacts SRE alerting

https://github.com/openshift/cluster-machine-approver/pull/209

Bug OCPBUGS-24352: [release-4.14] User can impersonate to all the user without the appropriate rolebinding

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-23125~~. The following is the description of the original issue:
—
Description of problem:

Customer has one user who reports that he can impersonate to all the users intermittently. 
We checked that the user does not have any such rolebinding nor the associated groups have the appropriate rolebinding which gives privilege to the user.

Version-Release number of selected component (if applicable):

How reproducible:

Not reproducible

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/console/pull/13400

Bug OCPBUGS-26548: LB not getting External-IP

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-26210~~. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25483. The following is the description of the original issue:
—
Description of problem:

A regression was identified creating LoadBalancer services in ARO in new 4.14 clusters (handled for new installations in OCPBUGS-24191)

The same regression has been also confirmed in ARO clusters upgraded to 4.14

Version-Release number of selected component (if applicable):

4.14.z

How reproducible:

On any ARO cluster upgraded to 4.14.z

Steps to Reproduce:

    1. Install an ARO cluster
    2. Upgrade to 4.14 from fast channel
    3. oc create svc loadbalancer test-lb -n default --tcp 80:8080

Actual results:

# External-IP stuck in Pending
$ oc get svc test-lb -n default
NAME      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
test-lb   LoadBalancer   172.30.104.200   <pending>     80:30062/TCP   15m


# Errors in cloud-controller-manager being unable to map VM to nodes
$ oc logs -l infrastructure.openshift.io/cloud-controller-manager=Azure  -n openshift-cloud-controller-manager
I1215 19:34:51.843715       1 azure_loadbalancer.go:1533] reconcileLoadBalancer for service(default/test-lb) - wantLb(true): started
I1215 19:34:51.844474       1 event.go:307] "Event occurred" object="default/test-lb" fieldPath="" kind="Service" apiVersion="v1" type="Normal" reason="EnsuringLoadBalancer" message="Ensuring load balancer"
I1215 19:34:52.253569       1 azure_loadbalancer_repo.go:73] LoadBalancerClient.List(aro-r5iks3dh) success
I1215 19:34:52.253632       1 azure_loadbalancer.go:1557] reconcileLoadBalancer for service(default/test-lb): lb(aro-r5iks3dh/mabad-test-74km6) wantLb(true) resolved load balancer name
I1215 19:34:52.528579       1 azure_vmssflex_cache.go:162] Could not find node () in the existing cache. Forcely freshing the cache to check again...
E1215 19:34:52.714678       1 azure_vmssflex.go:379] fs.GetNodeNameByIPConfigurationID(/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourceGroups/aro-r5iks3dh/providers/Microsoft.Network/networkInterfaces/mabad-test-74km6-master0-nic/ipConfigurations/pipConfig) failed. Error: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
E1215 19:34:52.714888       1 azure_loadbalancer.go:126] reconcileLoadBalancer(default/test-lb) failed: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0
I1215 19:34:52.714956       1 azure_metrics.go:115] "Observed Request Latency" latency_seconds=0.871261893 request="services_ensure_loadbalancer" resource_group="aro-r5iks3dh" subscription_id="fe16a035-e540-4ab7-80d9-373fa9a3d6ae" source="default/test-lb" result_code="failed_ensure_loadbalancer"
E1215 19:34:52.715005       1 controller.go:291] error processing service default/test-lb (will retry): failed to ensure load balancer: failed to map VM Name to NodeName: VM Name mabad-test-74km6-master-0

Expected results:

# The LoadBalancer gets an External-IP assigned
$ oc get svc test-lb -n default 
NAME         TYPE           CLUSTER-IP       EXTERNAL-IP                            PORT(S)        AGE 
test-lb      LoadBalancer   172.30.193.159   20.242.180.199                         80:31475/TCP   14s

Additional info:

In cloud-provider-config cm in openshift-config namespace, vmType=""

When vmType gets changed to "standard" explicitly, the provisioning of the LoadBalancer completes and an ExternalIP gets assigned without errors.

https://github.com/openshift/cluster-cloud-controller-manager-operator/pull/321

Bug OCPBUGS-17913: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/console/pull/13096

Bug OCPBUGS-18069: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2887

Bug OCPBUGS-20581: allow external CCM to be enabled/disabled from install-config.yaml

View the Description View the linked PRs

Description of problem:

Backport of https://github.com/openshift/installer/pull/7457 as referred to in https://issues.redhat.com//browse/SPLAT-1170.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/installer/pull/7581

Bug OCPBUGS-8282: Disable netlink mode of netclass collector in Node Exporter.

View the Description View the linked PRs

Description of problem:

We should disable netlink mode of netclass collector in Node Exporter. The netlink mode of netclass collector is introduced in 4.13 into the Node Exporter. When using the netlink mode, several metrics become unavailable. So to avoid confusing our user when they upgrade the OCP cluster to a new version and find several metrics missing on the NICs.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Using default config of CMO, Node Exporter's netclass collector is running in netlink mode.
The argument `--collector.netclass.netlink` is present in the `node-exporter` container in `node-exporter` daemonset.

Expected results:

Using default config of CMO, Node Exporter's netclass collector is running in classic mode. 
The argument `--collector.netclass.netlink` is absent in the `node-exporter` container in `node-exporter` daemonset.

Additional info:

https://github.com/openshift/cluster-monitoring-operator/pull/1912

Bug OCPBUGS-11946: Add new OCP 4.13 storage admission plugin

View the Description View the linked PRs

Description of problem:

Add storage admission plugin "storage.openshift.io/CSIInlineVolumeSecurity"

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1.Create OCP cluster v 4.13
2.Check config map kas-config

Actual results:

The CM does not include "storage.openshift.io/CSIInlineVolumeSecurity" storage plugin

Expected results:

The plugin should be included

Additional info:

https://github.com/openshift/hypershift/pull/2445

Bug OCPBUGS-13956: Use non alpha controller-runtime on Machine API Operator

View the Description View the linked PRs

Description of problem:

At moment we are using an alpha version of controller-runtime on the machine-api-operator.
Now that controller-runtime v0.15.0 is out, we want to bump to it.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/machine-api-operator/pull/1145

Bug OCPBUGS-15728: Machine config drifts when deploying with platform external

View the Description View the linked PRs

Description of problem:

When deploying with external platform, the reported state of the machine config pool is degraded, and we can observe a drift in the configuration:

$ diff /etc/mcs-machine-config-content.json ~/rendered-master-1b6aab788192600896f36c5388d48374
<                         "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n    /usr/bin/kubelet \\\n      --config=/etc/kubernetes/kubelet.conf \\\n      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n      --kubeconfig=/var/lib/kubelet/kubeconfig \\\n      --container-runtime-endpoint=/var/run/crio/crio.sock \\\n      --runtime-cgroups=/system.slice/crio.service \\\n      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n      --node-ip=${KUBELET_NODE_IP} \\\n      --minimum-container-ttl-duration=6m0s \\\n      --cloud-provider=external \\\n      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n       \\\n      --hostname-override=${KUBELET_NODE_NAME} \\\n      --provider-id=${KUBELET_PROVIDERID} \\\n      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n      --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n",
---
>                         "contents": "[Unit]\nDescription=Kubernetes Kubelet\nWants=rpc-statd.service network-online.target\nRequires=crio.service kubelet-auto-node-size.service\nAfter=network-online.target crio.service kubelet-auto-node-size.service\nAfter=ostree-finalize-staged.service\n\n[Service]\nType=notify\nExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests\nExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state\nExecStartPre=/bin/rm -f /var/lib/kubelet/memory_manager_state\nEnvironmentFile=/etc/os-release\nEnvironmentFile=-/etc/kubernetes/kubelet-workaround\nEnvironmentFile=-/etc/kubernetes/kubelet-env\nEnvironmentFile=/etc/node-sizing.env\n\nExecStart=/usr/local/bin/kubenswrapper \\\n    /usr/bin/kubelet \\\n      --config=/etc/kubernetes/kubelet.conf \\\n      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \\\n      --kubeconfig=/var/lib/kubelet/kubeconfig \\\n      --container-runtime-endpoint=/var/run/crio/crio.sock \\\n      --runtime-cgroups=/system.slice/crio.service \\\n      --node-labels=node-role.kubernetes.io/control-plane,node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \\\n      --node-ip=${KUBELET_NODE_IP} \\\n      --minimum-container-ttl-duration=6m0s \\\n      --cloud-provider= \\\n      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \\\n       \\\n      --hostname-override=${KUBELET_NODE_NAME} \\\n      --provider-id=${KUBELET_PROVIDERID} \\\n      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \\\n      --pod-infra-container-image=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:bde9fb486f1e8369b465a8c0aff7152c2a1f5a326385ee492140592b506638d6 \\\n      --system-reserved=cpu=${SYSTEM_RESERVED_CPU},memory=${SYSTEM_RESERVED_MEMORY},ephemeral-storage=${SYSTEM_RESERVED_ES} \\\n      --v=${KUBELET_LOG_LEVEL}\n\nRestart=always\nRestartSec=10\n\n[Install]\nWantedBy=multi-user.target\n",


the difference is --cloud-provider=external /--cloud-provider= is the flags passed to the kubelet.


We also observe the following log in the MCC:
W0629 09:57:44.583046       1 warnings.go:70] unknown field "spec.infra.status.platformStatus.external.cloudControllerManager"


"spec.infra.status.platformStatus.external.cloudControllerManager" is basically the flag in the Infrastructure object that enables the external platform.

Version-Release number of selected component (if applicable):

4.14 nightly

How reproducible:

Always when platform is external

Steps to Reproduce:

1. Deploy a cluster with the external platform enabled, the featureSet TechPreviewNoUpgrade should be set and the Infrastructure object should look like:

apiVersion: config.openshift.io/v1
kind: Infrastructure
metadata:
  creationTimestamp: "2023-06-28T10:37:12Z"
  generation: 1
  name: cluster
  resourceVersion: "538"
  uid: 57e09773-0eca-4767-95ce-8ec7d0f2cdae
spec:
  cloudConfig:
    name: ""
  platformSpec:
    external:
      platformName: oci
    type: External
status:
  apiServerInternalURI: https://api-int.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443
  apiServerURL: https://api.test-infra-cluster-3cd17632.assisted-ci.oci-rhelcert.edge-sro.rhecoeng.com:6443
  controlPlaneTopology: HighlyAvailable
  cpuPartitioning: None
  etcdDiscoveryDomain: ""
  infrastructureName: test-infra-cluster-3c-pqqqm
  infrastructureTopology: HighlyAvailable
  platform: External
  platformStatus:
    external:
      cloudControllerManager:
        state: External
    type: External
2. Observe the drift with: oc get mcp

Actual results:

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master                                                      False     True       True       3              0                   0                     3                      138m
worker   rendered-worker-d48036fe2b657e6c71d5d1275675fefc   True      False      False      3              3                   3                     0                      138m

Expected results:

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-2ff4e25f807ef3b20b7c6e0c6526f05d   True      False      False      3              3                   3                     0                      33m
worker   rendered-worker-48b7f39d78e3b1d94a0aba1ef4358d01   True      False      False      3              3                   3                     0                      33m

Additional info:

https://redhat-internal.slack.com/archives/C02CZNQHGN8/p1688035248716119

https://github.com/openshift/machine-config-operator/pull/3773

Bug OCPBUGS-21518: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/whereabouts-cni/pull/207

Bug OCPBUGS-26928: Frequent SAST false positives

View the Description View the linked PRs

This is a clone of issue OCPBUGS-26765. The following is the description of the original issue:
—
Description of problem:

The SAST scans keep coming up with bogus positive results from test and vendor files. This bug is just a placeholder to allow us to backport the change to ignore those files.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

https://github.com/openshift/baremetal-runtimecfg/pull/294

Bug OCPBUGS-20983: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/azure-file-csi-driver-operator/pull/76

Bug OCPBUGS-12267: OLM k8sResourcePrefix x-descriptor dropdown unexpectedly clears selections

View the Description View the linked PRs

Description of problem:

When using the k8sResourcePrefix x-descriptor with custom resource kinds, the form-view dropdown selection currently doesn't accept the initial user selection...requiring the user to make their selection twice. Also...if the configuration panel contains multiple custom resource dropdowns, then each previous dropdown selection on the panel is also cleared each time the user configures another custom resource dropdown, requiring the user to also reconfigure each previous selection.Here's an example of my configuration below:specDescriptors:
          - displayName: Collection
            path: collection
            x-descriptors:
              - >-
                urn:alm:descriptor:io.kubernetes:abc.zzz.com:v1beta1:Collection
          - displayName: Endpoints
            path: 'mapping[0].endpoints[0].name'
            x-descriptors:
              - >-
                urn:alm:descriptor:io.kubernetes:abc.zzz.com:v1beta1:Endpoint
          - displayName: Requested Credential Secret
            path: 'mapping[0].endpoints[0].credentialName'
            x-descriptors:
              - 'urn:alm:descriptor:io.kubernetes:Secret'
          - displayName: Namespaces
            path: 'mapping[0].namespace'
            x-descriptors:
              - 'urn:alm:descriptor:io.kubernetes:Namespace'
With this configuration, when a user wants to select a Collection or Endpoint from the form view dropdown, the user is forced to make their selection twice before the selection is accepted in the dropdown. Also, if the user does configure the Collection dropown, and then decides to configure the Endpoint dropdown, once the Endpoint selection is made, the Collection dropdown is then cleared.

Version-Release number of selected component (if applicable):

4.8

How reproducible:

Always

Steps to Reproduce:

1. Create a new project: 
  oc new-project descriptor-test
2. Create the resources in this gist: 
  oc create -f https://gist.github.com/TheRealJon/99aa89c4af87c4b68cd92a544cd7c08e/raw/a633ad172ff071232620913d16ebe929430fd77a/reproducer.yaml
3. In the admin console, go to the installed operators page in project 'descriptor-test'
4. Select Mock Operator from the list
5. Select "Create instance" in the Mock Resource provided API card
6. Scroll to the field-1
7. Select 'example-1' from the dropdown

Actual results:

Selection is not retained on the first click.

Expected results:

The selection should be retained on the first click.

Additional info:

In addition to this behavior, if a form has multiple k8sResourcePrefix dropdown fields, they all get cleared when attempting to select an item from one of them.

https://github.com/openshift/console/pull/12758

Bug OCPBUGS-14637: Leftover IngressController Preventing Clean Uninstall

View the Description View the linked PRs

Description of problem:

An uninstall was started, however it failed due to the hosted-cluster-config-operator being unable to clean up the default ingresscontroller

Version-Release number of selected component (if applicable):

4.12.18

How reproducible:

Unsure - though definitely not 100%

Steps to Reproduce:

1. Uninstall a HyperShift cluster

Actual results:

❯ k logs -n ocm-staging-2439occi66vhbj0pee3s4d5jpi4vpm54-mshen-dr2 hosted-cluster-config-operator-5ccdbfcc4c-9mxfk --tail 10 -f

{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Image registry is removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring ingress controllers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring load balancers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Load balancers are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Ensuring persistent volumes are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"There are no more persistent volumes. Nothing to cleanup.","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}
{"level":"info","ts":"2023-06-06T16:57:21Z","msg":"Persistent volumes are removed","controller":"resources","object":{"name":""},"namespace":"","name":"","reconcileID":"3a8e4485-3d0a-41b7-b82c-ff0a7f0040e6"}

After manually connecting to the hostedcluster and deleting the ingresscontroller, the uninstall progressed and succeded

Expected results:

The hosted cluster can cleanup the ingresscontrollers successfully and progress the uninstall

Additional info:

HyperShift dump: https://drive.google.com/file/d/1qqjkG4F_mSUCVMz3GbN-lEoqbshPvQcU/view?usp=sharing

https://github.com/openshift/hypershift/pull/2706

Bug OCPBUGS-19835: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/installer/pull/7541

Bug OCPBUGS-21019: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/platform-operators/pull/96

Bug OCPBUGS-12143: Update 4.14 ose-cli-artifacts image to be consistent with ART

View the Description View the linked PRs

Please review the following PR: https://github.com/openshift/oc/pull/1408

The PR has been automatically opened by ART (#aos-art) team automation and indicates
that the image(s) being used downstream for production builds are not consistent
with the images referenced in this component's github repository.

Differences in upstream and downstream builds impact the fidelity of your CI signal.

If you disagree with the content of this PR, please contact @release-artists
in #aos-art to discuss the discrepancy.

Closing this issue without addressing the difference will cause the issue to
be reopened automatically.

https://github.com/openshift/oc/pull/1406

Bug OCPBUGS-20374: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/containernetworking-plugins/pull/129

Bug OCPBUGS-20409: Builder fails to expose repository secrets for RUN

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20407~~. The following is the description of the original issue:
—
Description of problem:

When setting up transient mounts, which are used for exposing CA certificates and RPM package repositories to a build, a recent change we made in the builder attempted to replace simple bind mounts with overlay mounts.  While this might have made things easier for unprivileged builds, we overlooked that overlay mounts can't be made to files, only directories, so we need to revert the change.

Version-Release number of selected component (if applicable):

4.14.0

How reproducible:

Always

Steps to Reproduce:

Per https://redhat-internal.slack.com/archives/C014MHHKUSF/p1696882408656359?thread_ts=1696882334.352129&cid=C014MHHKUSF,
1. oc new-app - l app=pvg-nodejs --name pvg-nodejs pvg-nodejs https://github.com/openshift/nodejs-ex.git

Actual results:

mount /var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/merge:/run/secrets/redhat.repo (via /proc/self/fd/6), data: lowerdir=/tmp/redhat.repo-copy2014834134/redhat.repo,upperdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/upper,workdir=/var/lib/containers/storage/overlay-containers/9c3877f3062cc18b01f30db310e0e2bd0a1cd4527d74f41c313399e48fa81d23/userdata/overlay/145259665/work: *invalid argument*"

Expected results:

Successful setup for a transient mount to the redhat.repo file for a RUN instruction.

Additional info:

Bug introduced in https://github.com/openshift/builder/pull/349, should be fixed in https://github.com/openshift/builder/pull/359.

https://github.com/openshift/builder/pull/360

Bug OCPBUGS-11788: [4.14] Bootimage bump tracker

View the Description View the linked PRs

Tracker issue for bootimage bump in 4.14. This issue should block issues which need a bootimage bump to fix.

The previous bump was ~~OCPBUGS-10738~~.

https://github.com/openshift/installer/pull/7092

Bug OCPBUGS-10598: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-config-operator/pull/3620

Bug OCPBUGS-11595: HAProxy Segfaulting

View the Description View the linked PRs

Seeing segfault failures related to HAProxy on multiple platforms that begin around the same time as the [HAProxy bump|http://example.com] like:

{ nodes/ci-op-5s09hi2q-0dd98-rwds8-worker-centralus1-8nkx5/journal.gz:Apr 10 06:21:54.317971 ci-op-5s09hi2q-0dd98-rwds8-worker-centralus1-8nkx5 kernel: haproxy[302399]: segfault at 0 ip 0000556eadddafd0 sp 00007fff0cceed50 error 4 in haproxy[556eadc00000+2a3000]}

Sippy Node Process Segfaulted

release-master-ci-4.14-upgrade-from-stable-4.13-e2e-azure-sdn-upgrade/1645265104259780608

periodic-ci-openshift-release-master-ci-4.14-e2e-gcp-ovn-upgrade/1645265114720374784

periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade/1644449798939480064

Bug OCPBUGS-13084: Cannot use EgressIP for the vsphere csi driver to access the vcenter api

View the Description View the linked PRs

Description of problem:
CU wanted to restrict access to vcenter API and originating traffic needs to use a configured EgressIP. This is working fine for the machine API but the vsphere CSI driver controller uses the host network and hence the configured EgressIP isn't used.

Is it possible to disable this( use of host-network) for CSI controller?

slack thread: https://redhat-internal.slack.com/archives/CBQHQFU0N/p1683135077822559

https://github.com/openshift/vmware-vsphere-csi-driver-operator/pull/162

Bug OCPBUGS-15991: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/hypershift/pull/2787

Task HOSTEDCP-972: Update Make file to include command to perform all pre-commit checks

View the Description View the linked PRs

As a developer, I would like a Make file command that performs all the pre-commit checks that should be run before committing any code to GitHub. This includes updating Golang and API dependencies, building the source code, building the e2e's, verifying source code formatting, and running unit tests.

https://github.com/openshift/hypershift/pull/2465

Bug OCPBUGS-17975: Image Registry Pull through does not support IDMS/ITMS

View the Description View the linked PRs

Description of problem:

Pull-through only checks for ICSP, ignoring IDMS/ITMS.

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

1. Create an IDMS/ITMS rule (TODO: add specifics)
example IDMS/ITMS specifics:  

apiVersion: config.openshift.io/v1
kind: ImageDigestMirrorSet
metadata:
  name: digest-mirror
spec:
  imageDigestMirrors:
  - mirrors:
    - registry.access.redhat.com/ubi8/ubi-minimal
    source: quay.io/podman/hello
    mirrorSourcePolicy: NeverContactSource

apiVersion: config.openshift.io/v1
kind: ImageTagMirrorSet
metadata:
  name: tag-mirror
spec:
  imageTagMirrors:
  - mirrors:
    - registry.access.redhat.com/ubi8/ubi-minimal
    source: quay.io/podman/hello
    mirrorSourcePolicy: NeverContactSource

2. Create an image stream with `referencePolicy: local`. Example: https://gist.github.com/flavianmissi/0518239edd6f51d54b5633212f2b2ac9 
3. Pull the image from the image stream created above. Example `oc new-app test-1:latest`

Actual results:

Expected results:

Additional info:

https://github.com/openshift/image-registry/pull/375

Bug OCPBUGS-22531: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/prometheus/pull/183

Bug OCPBUGS-23426: The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

View the Description View the linked PRs

This is a clone of issue ~~OCPBUGS-20478~~. The following is the description of the original issue:
—
Description of problem:

The secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers is not synced correctly when updating secret/vsphere-creds in ns/kube-system

Version-Release number of selected component (if applicable):

4.14.0-0.nightly-2023-10-10-084534

How reproducible:

Always

Steps to Reproduce:

Before updating the secret

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/mode: passthrough
...

Same for the secret/vmware-vsphere-cloud-credentials in ns/openshift-cluster-csi-drivers

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
kind: Secret
metadata:
  annotations:
    cloudcredential.openshift.io/credentials-request: openshift-cloud-credential-operator/openshift-vmware-vsphere-csi-driver-operator
…

replace secret/vsphere-creds to use new vcenter (just for test)

$ oc -n kube-system get secret vsphere-creds -o yaml 
apiVersion: v1
data:
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(Updated to vcsa2-qe)

There are two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

restore secret/vsphere-creds

$ oc -n kube-system get secret vsphere-creds -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
(Updated to devqe)

Still two vcenter info in vmware-vsphere-cloud-credentials:

$ oc -n openshift-cluster-csi-drivers get secret vmware-vsphere-cloud-credentials -o yaml
apiVersion: v1
data:
  vcenter.devqe.ibmc.devcluster.openshift.com.password: xxx
  vcenter.devqe.ibmc.devcluster.openshift.com.username: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.password: xxx
  vcsa2-qe.vmware.devcluster.openshift.com.username: xxx
(devqe and vcsa2-qe)

Actual results:

The secret/vmware-vsphere-cloud-credentials is not synced well

Expected results:

The secret/vmware-vsphere-cloud-credentials should be synced well

Additional info:

Storage vSphere csi driver controller pods are crash looping.

https://github.com/openshift/cloud-credential-operator/pull/629

Bug OCPBUGS-24998: The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

View the Description View the linked PRs

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

https://github.com/openshift/machine-api-operator/pull/1191

Requirements	Notes	IS MVP
Discover new offerings in Home Dashboard		Y
Access details outlining value of offerings		Y
Access step-by-step guide to install offering		N
Allow developers to easily find and use newly installed offerings		Y
Support air-gapped clusters		Y

Deployment considerations	List applicable specific needs (N/A = not applicable)
Self-managed, managed, or both	Self-managed (though could be managed by partner)
Classic (standalone cluster)	Classic
Hosted control planes	Future
Multi node, Compact (three node), or Single node (SNO), or all	SNO
Connected / Restricted Network	All – connected and disconnected, air-gapped
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)	x86_x64
Operator compatibility	TBD
Backport needed (list applicable versions)	N/A
UI need (e.g. OpenShift Console, dynamic plugin, OCM)	N/A
Other (please specify)

requirement	Notes	isMvp?

Question	Outcome

requirement	Notes	isMvp?

Question	Outcome

4.14.0-0.okd-scos-2024-01-27-093745

Changes from 4.13.0-0.okd-scos-2024-04-09-152021

Complete Features

Goal

User Stories

Non-Requirements

Notes

Done Checklist

Problem Alignment

The Problem

High-Level Approach

Goal & Success

Solution Alignment

Key Capabilities

Key Flows

Open Questions & Key Decisions (optional)

Goals

Non-Goals

Motivation

Alternatives

Acceptance Criteria

Risk and Assumptions

Documentation Considerations

Open Questions

Additional Notes

Feature Overview (aka. Goal Summary)

Goals (aka. expected user outcomes)

Requirements (aka. Acceptance Criteria):

Use Cases (Optional):

Questions to Answer (Optional):

Out of Scope

Background

Customer Considerations

Documentation Considerations

Interoperability Considerations

Design Doc:

Problem:

Goal

Why is it important?

Use cases:

Acceptance criteria:

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

Description

Acceptance Criteria

Additional Details:

< High-Level description of the feature ie: Executive Summary >

Goals

Requirements

(Optional) Use Cases

Out of scope

Dependencies

Assumptions

Customer Considerations

Documentation Considerations

What does success look like?

QE Contact

Impact

Related Architecture/Technical Documents

Done Checklist

Problem:

Goal:

Why is it important?

Acceptance criteria:

Dependencies (External/Internal):

Design Artifacts:

Exploration:

Note:

Description