Back to index

4.7.0-0.okd-2023-02-28-020538

Jump to: Complete Features | Incomplete Features | Complete Epics | Other Complete |

Changes from 4.6.0-0.okd-2023-03-31-010121

Note: this page shows the Feature-Based Change Log for a release

Complete Features

These features were completed when this image was assembled

The details of this Jira Card are restricted (Only Red Hat employees and contractors)

Epic Goal

  • Enable the migration from a storage intree driver to a CSI based driver with minimal impact to the end user, applications and cluster
  • These migrations would include, but are not limited to:
    • CSI driver for AWS EBS
    • CSI driver for GCP
    • CSI driver for Azure (file and disk)
    • CSI driver for VMware vSphere

Why is this important?

  • OpenShift needs to maintain it's ability to enable PVCs and PVs of the main storage types
  • CSI Migration is getting close to GA, we need to have the feature fully tested and enabled in OpenShift
  • Upstream intree drivers are being deprecated to make way for the CSI drivers prior to intree driver removal

Scenarios

  1. User initiated move to from intree to CSI driver
  2. Upgrade initiated move from intree to CSI driver
  3. Upgrade from EUS to EUS

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>
The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

On new installations, we should make the StorageClass created by the CSI operator the default one. 

However, we shouldn't do that on an upgrade scenario. The main reason is that users might have set  a different quota on the CSI driver Storage Class.

Exit criteria:

  • New clusters get the CSI Storage Class as the default one.
  • Existing clusters don't get their default Storage Classes changed.

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Rebase OpenShift components to k8s v1.24

Why is this important?

  • Rebasing ensures components work with the upcoming release of Kubernetes
  • Address tech debt related to upstream deprecations and removals.

Scenarios

  1. ...

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. k8s 1.24 release

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

 

Why?

  • Decouple control and data plane. 
    • Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.
  • Improve security
    • Shift credentials out of cluster that support the operation of core platform vs workload
  • Improve cost
    • Allow a user to toggle what they don’t need.
    • Ensure a smooth path to scale to 0 workers and upgrade with 0 workers.

 

Assumption

  • A customer will be able to associate a cluster as “Infrastructure only”
  • E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
  • OR the entire cluster is labeled infrastructure , and node roles are ignored.
  • Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

 

 

Doc: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit 

Overview 

Customers do not pay Red Hat more to run HyperShift control planes and supporting infrastructure than Standalone control planes and supporting infrastructure.

Assumption

  • A customer will be able to associate a cluster as “Infrastructure only”
  • E.g. one option: management cluster has role=master, and role=infra nodes only, control planes are packed on role=infra nodes
  • OR the entire cluster is labeled infrastructure, and node roles are ignored.
  • Anything that runs on a master node by default in Standalone that is present in HyperShift MUST be hosted and not run on a customer worker node.

DoD 

Run cluster-storage-operator (CSO) + AWS EBS CSI driver operator + AWS EBS CSI driver control-plane Pods in the management cluster, run the driver DaemonSet in the hosted cluster.

More information here: https://docs.google.com/document/d/1sXCaRt3PE0iFmq7ei0Yb1svqzY9bygR5IprjgioRkjc/edit 

 

As HyperShift Cluster Instance Admin, I want to run AWS EBS CSI driver operator + control plane of the CSI driver in the management cluster, so the guest cluster runs just my applications.

  • Add a new cmdline option for the guest cluster kubeconfig file location
  • Parse both kubeconfigs:
    • One from projected service account, which leads to the management cluster.
    • Second from the new cmdline option introduced above. This one leads to the guest cluster.
  • Only on HyperShift:
    • When interacting with Kubernetes API, carefully choose the right kubeconfig to watch / create / update objects in the right cluster.
    • Replace namespaces in all Deployments and other objects that are created in the management cluster. They must be created in the same namespace as the operator.
  •  
  •  
    • Pass only the guest kubeconfig to the operand (control-plane Deployment of the CSI driver).

Exit criteria:

  • Control plane Deployment of AWS EBS CSI driver runs in the management cluster in HyperShift.
  • Storage works in the guest cluster.
  • No regressions in standalone OCP.

Feature Overview

Enable sharing ConfigMap and Secret across namespaces

Requirements

Requirement Notes isMvp?
Secrets and ConfigMaps can get shared across namespaces   YES

Questions to answer…

NA

Out of Scope

NA

Background, and strategic fit

Consumption of RHEL entitlements has been a challenge on OCP 4 since it moved to a cluster-based entitlement model compared to the node-based (RHEL subscription manager) entitlement mode. In order to provide a sufficiently similar experience to OCP 3, the entitlement certificates that are made available on the cluster (OCPBU-93) should be shared across namespaces in order to prevent the need for cluster admin to copy these entitlements in each namespace which leads to additional operational challenges for updating and refreshing them. 

Documentation Considerations

Questions to be addressed:
 * What educational or reference material (docs) is required to support this product feature? For users/admins? Other functions (security officers, etc)?
 * Does this feature have doc impact?
 * New Content, Updates to existing content, Release Note, or No Doc Impact
 * If unsure and no Technical Writer is available, please contact Content Strategy.
 * What concepts do customers need to understand to be successful in [action]?
 * How do we expect customers will use the feature? For what purpose(s)?
 * What reference material might a customer want/need to complete [action]?
 * Is there source material that can be used as reference for the Technical Writer in writing the content? If yes, please link if available.
 * What is the doc impact (New Content, Updates to existing content, or Release Note)?

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

  • Allow CSI volumes to be mounted into a build

Why is this important?

  • CSI volumes allow data to be mounted into containers via ephemeral CSI Volumes
  • Ephemeral CSI volumes are provided by CSI drivers that support this feature. Such drivers include:
  • When using sensitive credentials in a build, accessing secrets as a mounted volume ensure that these credentials are not present in the resulting container image.

Scenarios

  1. Access private artifact repositories (Artifactory, jFrog, Mavein)
  2. Download RHEL packages in a build

Acceptance Criteria

  • Builds can mount a CSI volume in a build
  • Content in the CSI volume is not present in the resulting container image.
  • If SCCs do not support fine controls over CSI volumes, provide this feature on a TechPreview basis with a feature gate.

Dependencies (internal and external)

  1. Buildah - support mounting of volumes when building with a Dockerfile
  2. (optional) Auth - use SCCs to control which CSI drivers are allowed to be used with ephemeral CSI volumes.

Previous Work (Optional):

  1. BUILD-257 - Build Resource Volume Mounts

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User Story

As an OpenShift cluster admin
I want to use the TechPreview feature set to try out CSI volumes in builds
So that I can test using CSI volumes in builds through tech preview features

Minimum Acceptance Criteria

  • Openshift-controller-manager-operator reads the cluster feature gate (BuildCSIVolumes).
  • If the feature gate for Build CSI volumes is enabled, the operator enables configuration in openshift-controller-manager to turn on CSI volumes for the build controller.
  • CI testing verifies that we pass the appropriate feature gate to the build controller when tech preview is enabled.

Docs Impact

Product documentation will not be required until BUILD-275 is complete. Documentation for CSI volumes in builds will need to note that the TechPreviewNoUpgrade feature set needs to be enabled on the cluster.

PX Impact

Additional training enablement materials may not be needed - product docs should be sufficient.

QE Imact

Full e2e testing may not be feasible until BUILD-275 is completed.
CI testing should verify that the appropriate configuration values were passed to the build controller.

We will likely need a new CI job that installs the cluster with tech preview enabled before we verify that the BuildCSIVolumes feature gate has been enabled.

Open Questions

  • Should ocm-o mark itself Upgradeable=false if we detect the BuildCSIVolumes feature gate has been enabled?

Notes

OpenShift already has feature gates baked into the core platform via the FeatureGate API object. For this feature, we need to declare a feature gate that is added to the TechPreviewNoUpgrade feature set, which openshift-controller-manager-operator then reads and applies to the build controller.

Feature gate needs to be proposed to openshift/api (add to the TechPreviewNoUpgrade feature set).
An example PR on how to do this: https://github.com/openshift/api/pull/982.
Once approved, the updated tech preview feature set needs to be vendored into openshift/library-go.
Openshift-controller-manager-operator needs to read the feature gate, pass it on to the build controller.
The build controller has its own configuration "API" - this was a relic of the 3.x master configuration that is not exposed to admins in OCP 4.x: https://github.com/openshift/api/blob/master/openshiftcontrolplane/v1/types.go#L198-L207

A separate operator looks checks if a *NoUpgrade feature set is enabled, and if so marks the cluster as unable to be upgraded to the next minor OCP
release.

To test this in CI, we need a suite that runs with the TechPreviewNoUpgrade feature set enabled. The step registry has primitives which bring up a cluster with tech preview features enabled. We will need to update ocm-o's CI configuration to run our operator tests with tech preview enabled. Testing for this specific feature will need to have separate logic that verifies we are sending the right configuration to the build controller under normal and TechPreview mode.

 

Existing techpreview CI step registry setups (note the per cloud elements, which make sense, since the existing CSI drivers are per cloud):

/ci-operator/step-registry/ipi/aws/pre/techpreview
./ci-operator/step-registry/ipi/azure/pre/techpreview
./ci-operator/step-registry/ipi/conf/aws/techpreview
./ci-operator/step-registry/ipi/conf/azure/techpreview
./ci-operator/step-registry/ipi/conf/techpreview
./ci-operator/step-registry/ipi/conf/openstack/techpreview
./ci-operator/step-registry/ipi/openstack/pre/techpreview
./ci-operator/step-registry/openshift/e2e/aws/techpreview
./ci-operator/step-registry/openshift/e2e/gcp/techpreview
./ci-operator/step-registry/openshift/e2e/azure/techpreview
./ci-operator/step-registry/openshift/e2e/openstack/techpreview
./ci-operator/step-registry/openshift/e2e/vsphere/techpreview

 

Given shared resources span all clouds, etc. does that mean we touch each of these, or create a new one, or both?

Incomplete Features

When this image was assembled, these features were not yet completed. Therefore, only the Jira Cards included here are part of this release

Goal:

As an administrator, I would like to deploy OpenShift 4 clusters to AWS C2S region

Problem:

Customers were able to deploy to AWS C2S region in OCP 3.11, but our global configuration in OCP 4.1 doesn't support this.
  

Why is this important:

  • Many of our public sector customers would like to move off 3.11 and on to 4.1, but missing support for AWS C2S region will prevent them from being able to migrate their environments..

Lifecycle Information:

  • Core

Previous Work:**

Here are the relevant PRs from OCP 3.11.  You can see that these endpoints are not part of the standard SDK (they use an entirely separate SDK).  To support these regions the endpoints had to be configured explicitly.

Seth Jennings has put together a highly customized POC.

Dependencies:

  • Custom API endpoint support w/ CA
    • Cloud / Machine API
    • Image Registry
    • Ingress
    • Kube Controller Manager
    • Cloud Credential Operator
    • others?
  • Require access to local/private/hidden AWS environment

 

Prioritized epics + deliverables (in scope / not in scope):

  • Allow AWS C2S region to be specified for OpenShift cluster deployment
  • Enable customers to use their own managed internal/cluster DNS solutions due to provider and operational restrictions
  • Document deploying OpenShift to AWS C2S region
  • Enable CI for the AWS C2S region

Related : https://jira.coreos.com/browse/CORS-1271

Estimate (XS, S, M, L, XL, XXL): L

 

Customers: North America Public Sector and Government Agencies

Open Questions:

 

 

 

We need to continue to maintain specific areas within storage, this is to capture that effort and track it across releases.

Goals

  • To allow OCP users and cluster admins to detect problems early and with as little interaction with Red Hat as possible.
  • When Red Hat is involved, make sure we have all the information we need from the customer, i.e. in metrics / telemetry / must-gather.
  • Reduce storage test flakiness so we can spot real bugs in our CI.

Requirements

Requirement Notes isMvp?
Telemetry   No
Certification   No
API metrics   No
     

Out of Scope

n/a

Background, and strategic fit
With the expected scale of our customer base, we want to keep load of customer tickets / BZs low

Assumptions

Customer Considerations

Documentation Considerations

  • Target audience: internal
  • Updated content: none at this time.

Notes

In progress:

  • CI flakes:
    • Configurable timeouts for e2e tests
      • Azure is slow and times out often
      • Cinder times out formatting volumes
      • AWS resize test times out

 

High prio:

  • Env. check tool for VMware - users often mis-configure permissions there and blame OpenShift. If we had a tool they could run, it might report better errors.
    • Should it be part of the installer?
    • Spike exists
  • Add / use cloud API call metrics
    • Helps customers to understand why things are slow
    • Helps build cop to understand a flake
      • With a post-install step that filters data from Prometheus that’s still running in the CI job.
    • Ideas:
      • Cloud is throttling X% of API calls longer than Y seconds
      • Attach / detach / provisioning / deletion / mount / unmount / resize takes longer than X seconds?
    • Capture metrics of operations that are stuck and won’t finish.
      • Sweep operation map from executioner???
      • Report operation metric into the highest bucket after the bucket threshold (i.e. if 10minutes is the last bucket, report an operation into this bucket after 10 minutes and don’t wait for its completion)?
      • Ask the monitoring team?
    • Include in CSI drivers too.
      • With alerts too

Unsorted

  • As the number of storage operators grows, it would be grafana board for storage operators
    • CSI driver metrics (from CSI sidecars + the driver itself  + its operator?)
    • CSI migration?
  • Get aggregated logs in cluster
    • They're rotated too soon
    • No logs from dead / restarted pods
    • No tools to combine logs from multiple pods (e.g. 3 controller managers)
  • What storage issues customers have? it was 22% of all issues.
    • Insufficient docs?
    • Probably garbage
  • Document basic storage troubleshooting for our supports
    • What logs are useful when, what log level to use
    • This has been discussed during the GSS weekly team meeting; however, it would be beneficial to have this documented.
  • Common vSphere errors, their debugging and fixing. 
  • Document sig-storage flake handling - not all failed [sig-storage] tests are ours

Epic Goal

  • Update all images that we ship with OpenShift to the latest upstream releases and libraries.
  • Exact content of what needs to be updated will be determined as new images are released upstream, which is not known at the beginning of OCP development work. We don't know what new features will be included and should be tested and documented. Especially new CSI drivers releases may bring new, currently unknown features. We expect that the amount of work will be roughly the same as in the previous releases. Of course, QE or docs can reject an update if it's too close to deadline and/or looks too big.

Traditionally we did these updates as bugfixes, because we did them after the feature freeze (FF). Trying no-feature-freeze in 4.12. We will try to do as much as we can before FF, but we're quite sure something will slip past FF as usual.

Why is this important?

  • We want to ship the latest software that contains new features and bugfixes.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.

Update all CSI sidecars to the latest upstream release.

  • external-attacher
  • external-provisioner
  • external-resizer
  • external-snapshotter
  • node-driver-registrar
  • livenessprobe

This includes update of VolumeSnapshot CRDs in https://github.com/openshift/cluster-csi-snapshot-controller-operator/tree/master/assets

Update all OCP and kubernetes libraries in storage operators to the appropriate version for OCP release.

This includes (but is not limited to):

  • Kubernetes:
    • client-go
    • controller-runtime
  • OCP:
    • library-go
    • openshift/api
    • openshift/client-go
    • operator-sdk

Operators:

  • aws-ebs-csi-driver-operator 
  • aws-efs-csi-driver-operator
  • azure-disk-csi-driver-operator
  • azure-file-csi-driver-operator
  • openstack-cinder-csi-driver-operator
  • gcp-pd-csi-driver-operator
  • gcp-filestore-csi-driver-operator
  • manila-csi-driver-operator
  • ovirt-csi-driver-operator
  • vmware-vsphere-csi-driver-operator
  • alibaba-disk-csi-driver-operator
  • ibm-vpc-block-csi-driver-operator
  • csi-driver-shared-resource-operator

 

  • cluster-storage-operator
  • csi-snapshot-controller-operator
  • local-storage-operator
  • vsphere-problem-detector

Complete Epics

This section includes Jira cards that are linked to an Epic, but the Epic itself is not linked to any Feature. These epics were completed when this image was assembled

 

Monitoring needs to be reliable and is the very useful when trying to debug clusters in an already degraded state. We want to ensure that metrics scraping can always work if the scraper can reach the target, even if the kube-apiserver is unavailable or unreachable. To do this, we will combine a local authorizer (already merged in many binaries and the rbac-proxy) and client-cert based authentication to have a fully local authentication and authorization path for scraper targets.

If networking (or part of networking) is down and a scraper target cannot reach the kube-apiserver to verify a token and a subjectaccessreview, then the metrics scraper can be rejected. The subjectaccessreview (authorization) is already largely addressed, but service account tokens are still used for scraping targets. Tokens require an external network call that we can avoid by using client certificates. Gathering metrics, especially client metrics, from partially functionally clusters helps narrow the search area between kube-apiserver, etcd, kubelet, and SDN considerably.

In addition, this will significantly reduce the load on the kube-apiserver. We have observed in the CI cluster that token and subject access reviews are a significant percentage of all kube-apiserver traffic.

Acceptance Criteria

  • CI - MUST be running successfully with tests automated
  • Release Technical Enablement - Provide necessary release enablement details and documents.
  • ...

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.
  • Release Enablement <link to Feature Enablement Presentation>
  • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
  • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
  • DEV - Downstream build attached to advisory: <link to errata>
  • QE - Test plans in Polarion: <link or reference to Polarion>
  • QE - Automated tests merged: <link or reference to automated tests>
  • DOC - Downstream documentation merged: <link to meaningful PR>

User story:

As cluster-policy-controller I automatically approve cert signing requests issued by monitoring.

DoD:

  • cert signing requests issued by the cluster-monitoring-operator service account are approved automatically.

Implementation hints: leverage approving logic implemented in https://github.com/openshift/library-go/pull/1083.

The details of this Jira Card are restricted (Red Hat Employee and Contractors only)

User Story

As an cluster administrator of a disconnected OCP cluster,
I want a list of possible sample images to mirror
So that I can configure my image mirror prior to installing OCP in a disconnected environment.

Acceptance Criteria

  • Publish the list of the sample images to mirror as a ConfigMap in the samples operator namespace.
  • Provide instructions on how to obtain the current image SHAs from the list above (via podman or skopeo).
  • Reference the ConfigMap name in our "import failing" alert.
  • [optional] Reference the ConfigMap name in our "Removed" condition message.

Notes

it is too onerous to find a connected cluster in order to obtain the list of possible samples images to mirror using the current documented procedures.
I need a list make available to me in my disconnected cluster that I can reference after initial install.

User Story

Sample operator use of the reason field in its config object to track imagestream import completion has resulted in that singleton being a bottleneck and source of update conflicts (we are talking 60 or 70 imagestreams potentially updating that once field concurrently).

Acceptance Criteria

  • Reduction in reconciliation errors when imagestream imports complete (success or failure).
  • ConfigMaps containing failing imagstream imports are added to must-gather
  • After all imagestreams successfully import, there are no error-recording ConfigMaps in the samples-operator namespace.

Notes

See this hackday PR

for an alternative approach which uses a configmap per imagestream

Goal:

Record the lastImageChangeTriggeredID field in the BuildConfig status, instead of spec.

Problem:

BuildConfigs use the lastImageChangeTriggeredID field to record the SHA of the last image used to trigger a build. This information should only be managed by the Build controllers and be placed in the BuildConfig status. By keeping the data in spec, cluster admins and developers can easily alter or remove this information.

Why is this important?

Moving this information to status is a prerequisite for using GitOps to manage BuildConfigs with ImageChange triggers.

Dependencies

None

Stories and Deliverables

  • BUILD-186: Record LastImageChangeTriggeredID in status

Estimate: (XS, S, M, L, XL, XXL) S

Previous Work:

https://bugzilla.redhat.com/show_bug.cgi?id=1876500

User Stories:

As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch

As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.

As an OpenShift cluster admin
I want to be alerted if something cleared the LastImageTriggeredID in a BuildConfig
So that I am aware that cluster users are relying on deprecated behavior.

Success Criteria:

1. Image change triggers record lastImageTriggeredID in the BuildConfig's status
2. Customers are notified that lastImageTriggeredID is deprecated in BuildConfig's spec
3. Customers are alerted if their users are relying on deprecated behavior

Open Questions:


See https://docs.google.com/document/d/1thcVR31TElJBXBGmyaYXZ9jRejp5W4-ppNYn4g3_mgk/edit# for a fuller explanation on how to use this template.

User Story

As an OpenShift engineer
I want to image change triggers to record their information BuildConfig status
So that eventually changes to ImageChange triggers in the BuildConfig spec do not cause builds to launch

As a developer using OpenShift to build images
I want to trigger a build when the base image of my application changes
So that bug fixes and security patches are applied to my application.

Acceptance Criteria

Docs Impact

Release note should inform customers that the lastImageChangeTriggeredID field in the BuildConfig spec is deprecated, and will be ignored in OCP 4.9. Users relying on this information should update their scripts and jobs to read the triggered image ID from BuildConfig status. 

Please see https://issues.redhat.com/browse/RHDEVDOCS-2738 for more details.

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

If lastImageTriggeredID is set to the empty string or nil, today this will trigger a build. By moving the data to status, we can ignore the lastImageTriggeredID field in spec in a future release. To give users proper notice, we are not going to remove the current behavior until a later release.

A deprecation notice will need to be added to the release notes upon completion.

Changes will be needed in the following:

1. API
2. openshift-apiserver
3. openshift-controller-manager (build controller)

See https://bugzilla.redhat.com/show_bug.cgi?id=1876500.


Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Epic Goal

  • Update OpenShift components that are owned by the Builds + Jenkins Team to use Kubernetes 1.25

Why is this important?

  • Our components need to be updated to ensure that they are using the latest bug/CVE fixes, features, and that they are API compatible with other OpenShift components.

Acceptance Criteria

  • Existing CI/CD tests must be passing

Goal: Support OCI images.

Problem: Buildah and podman use OCI format by default and OpenShift Image Registry and ImageStream API doesn't understand it.

Why is this important: OCI images are supposed to replace Docker schema 2 images, OpenShift should be ready when OCI images become widely adopted.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL): XL

Previous Work:

Customers:

Open Questions:

User Story

As a user of OpenShift
I want to push OCI images to the registry
So that I can use buildah and podman with their defaults to push images

Acceptance Criteria

  • An OCI image can be pushed to the registry by buildah or podman
  • An imported OCI image can be pulled from the registry
  • The registry should be able to pull-through OCI images from other registries

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

  • Enhancement proposal link
  • Previous product docs
  • Best practices
  • Known issues

Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Epic Goal

  • Improve CI testing of the image registry components.

Why is this important?

  • The image registry, image API and the image pruner had a lot of tests removed during transition 4.0. This may make the platform less stable and/or slow down the team.

Scenarios

  1. ...

Acceptance Criteria

  • CI - tests should be more stable and have broader coverage

Dependencies (internal and external)

  1. ...

Previous Work (Optional):

Open questions::

Done Checklist

  • CI - CI is running, tests are automated and merged.

An the registry developer
I want e2e-upgrade jobs to monitor availability of the registry during upgrades
So that I can be sure that clients can use the registry without disruptions.

Acceptance Criteria

  • A new test in openshift/origin repo.

Notes

https://github.com/openshift/origin/blob/e6b3d1ece61d7c3ab5a23151c9875e1f9ad36838/test/extended/util/disruption/controlplane/controlplane.go#L69

https://bugzilla.redhat.com/show_bug.cgi?id=1884380

https://github.com/openshift/origin/pull/25475/files marked our tests for ISI as Disruptive.

Tests should wait until operators become stable, otherwise other tests will be run on an unstable cluster and it'll cause flakes.

Acceptance Criteria

  • The tests wait until operators stable after image.config changes.
  • The tests is no longer [Disruptive].
  • If tests are slow (it depends on other operator, but MCO tends to be slow), they should be [Slow].

The integration tests for the image registry expect that OpenShift and tests are run on the same machine (i.e. OpenShift can connect to sockets that the tests listen). This is not the case with e2e tests.

Acceptance Criteria

  • every integration test is converted into an e2e test or a techdebt story
  • image-registry tests are green

Goal: Rebase registry to Docker Distribution 

Problem: The registry is currently based on an outdated version of the upstream docker/distribution project. The base does not even have a version associated with it - DevEx last rebased on an untagged commit.

Why is this important: Update the registry with improvements and bug fixes from the upstream community.

Dependencies (internal and external):

Prioritized epics + deliverables (in scope / not in scope):

Estimate (XS, S, M, L, XL, XXL): M

Previous Work:

Customers:

 

Open questions:

User Story

As a user of OpenShift
I want the image registry to be rebased on the latest docker/distribution release (v2.7.1)
So that the image registry has the latest upstream bugfixes and enhancements

Acceptance Criteria

  • Image registry is based on docker/distribution v2.7.1

Launch Checklist

Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated

Notes

Add pertinent notes here:

  • Enhancement proposal link
  • Previous product docs
  • Best practices
  • Known issues

Guiding Questions

User Story

  • Is this intended for an administrator, application developer, or other type of OpenShift user?
  • What experience level is this intended for? New, experienced, etc.?
  • Why is this story important? What problems does this solve? What benefit(s) will the customer experience?
  • Is this part of a larger epic or initiative? If so, ensure that the story is linked to the appropriate epic and/or initiative.

Acceptance Criteria

  • How should a customer use and/or configure the feature?
  • Are there any prerequisites for using/enabling the feature?

Notes

  • Is this a new feature, or an enhancement of an existing feature? If the latter, list the feature and docs reference.
  • Are there any new terms, abbreviations, or commands introduced with this story? Ex: a new command line argument, a new custom resource.
  • Are there any recommended best practices when using this feature?
  • On feature completion, are there any known issues that customers should be aware of?

Other Complete

This section includes Jira cards that are not linked to either an Epic or a Feature. These tickets were completed when this image was assembled

Description of problem:

oc --context build02 get clusterversion
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-ec.1   True        False         45h     Error while reconciling 4.12.0-ec.1: the cluster operator kube-controller-manager is degraded

oc --context build02 get co kube-controller-manager
NAME                      VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
kube-controller-manager   4.12.0-ec.1   True        False         True       2y87d   GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 172.30.153.28:9091: connect: cannot assign requested address

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:

Additional info:

build02 is a build farm cluster in CI production.
I can provide credentials to access the cluster if needed.

discover-etcd-initial-cluster was written very early on in the cluster-etcd-operator life cycle. We have observed at least one bug in this code and in order to validate logical correctness it needs to be rewritten with unit tests.

PR: https://github.com/openshift/etcd/pull/73

With CSISnapshot capability is disabled, all CSI driver operators are Degraded. For example AWS EBS CSI driver operator during installation:

18:12:16.895: Some cluster operators are not ready: storage (Degraded=True AWSEBSCSIDriverOperatorCR_AWSEBSDriverStaticResourcesController_SyncError: AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: "volumesnapshotclass.yaml" (string): the server could not find the requested resource
AWSEBSCSIDriverOperatorCRDegraded: AWSEBSDriverStaticResourcesControllerDegraded: )
Ginkgo exit error 1: exit with code 1}

Version-Release number of selected component (if applicable):
4.12.nightly

The reason is that cluster-csi-snapshot-controller-operator does not create VolumeSnapshotClass CRD, which AWS EBS CSI driver operator expects to exist.

CSI driver operators must skip VolumeSnapshotClass creation if the CRD does not exist.

Description of problem:

In looking at jobs on an accepted payload at https://amd64.ocp.releases.ci.openshift.org/releasestream/4.12.0-0.ci/release/4.12.0-0.ci-2022-08-30-122201 , I observed this job https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-sdn-serial/1564589538850902016 with "Undiagnosed panic detected in pod" "pods/openshift-controller-manager-operator_openshift-controller-manager-operator-74bf985788-8v9qb_openshift-controller-manager-operator.log.gz:E0830 12:41:48.029165       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)" 

Version-Release number of selected component (if applicable):

4.12

How reproducible:

probably relatively easy to reproduce (but not consistently) given it's happened several times according to this search: https://search.ci.openshift.org/?search=Observed+a+panic%3A+%22invalid+memory+address+or+nil+pointer+dereference%22&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Steps to Reproduce:

1. let nightly payloads run or run one of the presubmit jobs mentioned in the search above
2.
3.

Actual results:

Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)}

Expected results:

no panics

Additional info: