European Case Law Identifier: | ECLI:EP:BA:2019:T071014.20190708 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Date of decision: | 08 July 2019 | ||||||||
Case number: | T 0710/14 | ||||||||
Application number: | 10008909.3 | ||||||||
IPC class: | H04N 7/26 H04N 7/50 |
||||||||
Language of proceedings: | EN | ||||||||
Distribution: | D | ||||||||
Download and more information: |
|
||||||||
Title of application: | Improved interpolation of video compression frames | ||||||||
Applicant name: | Dolby Laboratories Licensing Corporation | ||||||||
Opponent name: | - | ||||||||
Board: | 3.5.04 | ||||||||
Headnote: | - | ||||||||
Relevant legal provisions: |
|
||||||||
Keywords: | Inventive step - (yes) | ||||||||
Catchwords: |
- |
||||||||
Cited decisions: |
|
||||||||
Citing decisions: |
|
Summary of Facts and Submissions
I. The appeal is against the decision to refuse European patent application No. 10 008 909.3, published as EP 2 262 268 A2. The application is a divisional application of the earlier European patent application No. 10 005 839.5 which was published as EP 2 254 339 A2, which in turn is a divisional application of earlier European patent application No. 03 762 175.2 published as international application WO 2004/004310 A2. The present appeal is related to appeal case T 708/14, which concerns the earlier application No. 10 005 839.5.
II. The examining division refused the present patent application on the grounds that the subject-matter of the independent claims of the then main and first auxiliary requests lacked inventive step in view of documents:
D2: ITU Study Group 16 - Video Coding Experts Group -ISO/IEC MPEG & ITU-T VCEG(ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q6), 11th Meeting, Portland, OR, USA, 22-25 August 2000, no. q15k44, pages 1-2, XP030003136;
D3: Hannuksela, M.: "Generalized B/MH-Picture Averaging", ITU Study Group 16 - Video Coding Experts Group -ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q6), 3rd Meeting, Fairfax, VA, USA, 6-10 MAY 2002, no. JVT-C077, pages 1-8, XP030005186;
D4: Bjontegaard, G. et al.: "H.26L Test Model Long Term Number 4 (TML-4)", 10. VCEG Meeting; 16-05-2000 - 19-05-2000; Osaka, JP; (Video Coding Experts Group of ITU-T SG.16), no. q15j72d0, 16 June 2000, XP030003092, ISSN: 0000-0464.
It also referred to the following document:
D5: Kikuchi Y.: "Improved multiple frame motion compensation using frame interpolation", ITU Study Group 16 - Video Coding Experts Group -ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/ SC29/WG11 and ITU-T SG16 Q6), 2nd Meeting, Geneva, CH, Jan. 29 - Feb. 1, 2002, no. JVT-B075, pages 1-8, XP030005075.
III. The applicant filed notice of appeal against this decision, requesting that it be set aside. With its statement of grounds of appeal the appellant requested that a patent be granted on the basis of the claims of the main request or the auxiliary request on which the decision under appeal is based.
IV. In response to the summons to oral proceedings, by a letter dated 12 February 2019, the appellant filed amended claims according to a new main request and renumbered its previous main and auxiliary requests to become the first and second auxiliary requests. It also filed description pages 4a and 5 and identified the description pages and drawings to be used for the requested grant of a patent on the basis of the new main request.
V. In response the board cancelled the oral proceedings.
VI. The requests of the appellant are therefore that the decision under appeal be set aside and that a patent be granted on the basis of the claims of the main request filed by letter of 12 February 2019, the first auxiliary request, corresponding to the main request on which the decision under appeal is based, or the second auxiliary request, corresponding to the auxiliary request on which the decision under appeal is based.
VII. Independent claims 1 to 4 of the main request read as follows:
"1. A method for video image compression using direct mode prediction, including:
providing a sequence of predicted and bi-directionally predicted frames each comprising pixel values arranged in picture areas, wherein at least one of the picture areas in at least one of the bi-directionally predicted frames is predicted from picture areas in one or more reference frames using direct mode prediction; and determining a predicted motion vector for a bi-directionally predicted frame with a frame-distance method in direct mode prediction, wherein the frame-distance method comprises: given a predicted picture area in a first reference frame, the bi-directionally predicted frame with another predicted picture area, and a motion vector between the predicted picture area in the first reference frame and a picture area in a second reference frame, the predicted motion vector is the motion vector multiplied by a frame scale fraction, wherein a numerator of the frame scale fraction is equal to a first distance between the bi-directionally predicted frame of the another predicted picture area and the first reference frame and a denominator of the frame scale fraction is equal to a second distance between the first reference frame and the second reference frame referenced by the first reference frame,
wherein both the first and second reference frames are in a display order that is prior to a display order of the bi-directionally predicted frame of the predicted picture area.
2. A video compression system adapted to
provide a sequence of predicted and bi-directionally predicted frames each comprising pixel values arranged in picture areas, wherein at least one of the picture areas in at least one of the bi-directionally predicted frames is predicted from picture areas in one or more reference frames using direct mode prediction; and
determine a predicted motion vector for a bi-directionally predicted frame with a frame-distance method in direct mode prediction, wherein the frame-distance method comprises: given a predicted picture area in a first reference frame, the bi-directionally predicted frame with another predicted picture area, and a motion vector between the predicted picture area in the first reference frame and a picture area in a second reference frame, the predicted motion vector is the motion vector multiplied by a frame scale fraction, wherein a numerator of the frame scale fraction is equal to a first distance between the bi-directionally predicted frame of the another predicted picture area and the first reference frame and a denominator of the frame scale fraction is equal to a second distance between the first reference frame and the second reference frame referenced by the first reference frame,
wherein both the first and second reference frames are in a display order that is prior to a display order of the bi-directionally predicted frame of the predicted picture area.
3. A method for video image decompression using direct mode prediction, including:
receiving a sequence of predicted and bi-directionally predicted frames each comprising pixel values arranged in picture areas, wherein at least one of the picture areas in at least one of the bi-directionally predicted frames is predicted from picture areas in one or more reference frames using direct mode prediction; and
determining a predicted motion vector for a bi-directionally predicted frame with a frame-distance method in direct mode prediction, wherein the frame-distance method comprises: given a predicted picture area in a first reference frame, the bi-directionally predicted frame with another predicted picture area, and a motion vector between the predicted picture area in the first reference frame and a picture area in a second reference frame, the predicted motion vector is the motion vector multiplied by a frame scale fraction, wherein a numerator of the frame scale fraction is equal to a first distance between the bi-directionally predicted frame of the another predicted picture area and the first reference frame and a denominator of the frame scale fraction is equal to a second distance between the first reference frame and the second reference frame referenced by the first reference frame,
wherein both the first and second reference frames are in a display order that is prior to a display order of the bi-directionally predicted frame of the predicted picture area.
4. A video decompression system adapted to
receive a sequence of predicted and bi-directionally predicted frames each comprising pixel values arranged in picture areas, wherein at least one of the picture areas in at least one of the bi-directionally predicted frames is predicted from picture areas in one or more reference frames using direct mode prediction; and
determining a predicted motion vector for a bi-directionally predicted frame with a frame-distance method in direct mode prediction, wherein the frame-distance method comprises: given a predicted picture area in a first reference frame, the bi-directionally predicted frame with another predicted picture area, and a motion vector between the predicted picture area in the first reference frame and a picture area in a second reference frame, the predicted motion vector is the motion vector multiplied by a frame scale fraction, wherein a numerator of the frame scale fraction is equal to a first distance between the bi-directionally predicted frame of the another predicted picture area and the first reference frame and a denominator of the frame scale fraction is equal to a second distance between the first reference frame and the second reference frame referenced by the first reference frame,
wherein both the first and second reference frames are in a display order that is prior to a display order of the bi-directionally predicted frame of the predicted picture area."
VIII. In the decision under appeal, the examining division had held that document D2 was the closest prior art with regard to the claimed subject-matter and that it implicitly included the disclosure of document D4 and contained an improvement of the direct mode disclosed in D4, wherein the weights of D2 were made equal to corresponding weights used for motion vector scaling.
The subject-matter of claim 1 of the then main request was distinguished from D2 in that both the first and the second reference frames were in a display order that was prior to a display order of the bi-directionally predicted frame of the predicted picture area.
The problem to be solved by the present invention might therefore be regarded as increasing flexibility of coding. The claimed solution was obvious in view of D2 in combination with D3. Document D3 referenced document D2 (see page 1, paragraph "1. Summary"; pages 2 and 3, point 2.2.1) and proposed a solution aimed at increasing flexibility. On page 1, paragraph "1. Summary" and pages 3 to 4, paragraph "3. Generalized weighting of MH-Pictures", document D3 disclosed a method for interpolating a B-frame from two reference frames, where the three frames were in arbitrary order (i.e. their occurrence times T, T1 and T2 had an arbitrary order).
The syntax enabling B-frame interpolation according to the teachings of document D3, i.e. the calculation of the prediction weights for determining the B-frame predictor, was disclosed in D3, pages 4 and 5, paragraph 4.2. The particular interpolation weights for the B-frame interpolation were defined in equation (4). With the formal definitions provided by equation (5), i.e. TRB = T-T1, and TRD = T2-T1, equation (4) could be rewritten as equation (6).
Equation (6) referred to weighted averaging for coding B-frames, as disclosed in document D2, paragraph "Description", referenced in relation to said averaging in document D3, page 1, paragraph 1, first line and pages 2 and 3, paragraph 2.2.1.
Document D3 (see pages 2 and 3, paragraph 3, in particular the typographic paragraph bridging pages 2 and 3) disclosed that equation (6), with the formal definitions provided by equation (5), applied also to an arbitrary order of the B-frame to be interpolated and of the two frames used as references for the interpolation. The equations of D3 also applied if the B-frame was followed or preceded by two P frames. Therefore, the skilled person would arrive at the invention as claimed if they combined the teachings of D3 with the direct mode coding of a B-picture in document D2.
Hence, the solution proposed in claim 1 of the present application was obvious in view of D2 and D3 (see decision under appeal, Reasons, points 2.1 to 2.4).
Reasons for the Decision
1. The appeal is admissible.
The invention
2. The invention relates to video (de-)compression, in particular to a method for improved interpolation of video frames in MPEG-encoding systems.
2.1 It was previously known to encode frames as bi-directionally predicted (B-)frames using bidirectional mode or direct mode.
In bidirectional mode, blocks of the bi-directionally predicted frame are encoded using forward and backward motion vectors describing the motion of a macroblock in the predicted frame with respect to macroblocks in a subsequent (forward) and a preceding (backward) (I- or P-)reference frame. The motion vectors are transmitted from the encoder to the decoder to enable reconstruction of the bi-directionally predicted frame.
In contrast, in direct mode no separate motion vectors are transmitted for a bi-directionally predicted frame. Instead, the motion vectors for the bi-directionally predicted frame are derived from the motion vector between the subsequent reference frame and the preceding reference frame using a proportional weighting corresponding to time distances from the bi-directionally predicted frame to these reference frames (called "motion vector interpolation"; see paragraphs [0011], [0013] and [0014] of the application as filed and D4, chapter 6.4.2).
2.2 The present application proposes a direct mode extension that makes it possible to extrapolate a motion vector (denoted by mv in Figure 16 of the present application, which is reproduced below) of a first reference frame (P1) which is used to predict a second reference frame (P2).
FORMULA/TABLE/GRAPHIC
To account for the extrapolation, the motion vector has to be multiplied by a frame scale fraction which is equal to the distance between the bi-directionally predicted frame and the first reference frame divided by the distance between the first and second reference frames (4/3 of mv); see application as filed, paragraphs [0145] and [0163] to [0168] together with Figures 16 and 17.
Amendments (Articles 76(1) and 123(2) EPC)
3. Compared with claim 1 of the main request on which the decision under appeal is based, the present claim 1 has been restricted to specify that direct mode prediction is used for the extrapolated motion vector. It also specifies the frame scale fraction more exactly than in the previous set of claims by defining the terms of the equation.
3.1 A basis for these amendments can be found in Figures 16 and 17 and paragraphs [0148], [0163] and [0164] of the application as filed and at the same location in both earlier applications as filed. Corresponding amendments have been made to the other independent claims 2, 3 and 4.
3.2 Hence, the board finds that the claims of the appellant's main request do not contain subject-matter extending beyond the content of the application or the earlier applications as filed, and that they thus comply with Articles 76(1) and 123(2) EPC.
Inventive step, Article 56 EPC
4. It is common ground that D2 may be considered the closest prior art for the subject-matter of claim 1.
4.1 D2 refers to a Test Model Long Term Number (TML) simulation model which is described in detail in D4 and in particular refers to direct mode prediction in TML. Thus, D2 implicitly includes the motion vector interpolation features from chapter 6 of D4 and thus discloses the features of the direct mode prediction described under point 2.1 above. In particular, D4, chapter 6.4.2, discloses a frame scale fraction equal to the distance between the bi-directionally predicted frame and the first reference frame divided by the distance between the first and second reference frames. As an improvement over D4, D2 proposes to interpolate the pixel value of a predicted block in a B-frame on the basis of time distances to the preceding and subsequent reference frames (see D2, chapter "Description"). Hence, D2 proposes a pixel value interpolation which is performed in a similar manner to the motion vector interpolation in D4.
4.2 It follows that D2 does not disclose the following feature of claim 1:
- the two referenceable frames are previous in display order to the bi-directionally predicted frame
which is in line with the examining division's finding in the decision under appeal (see point VIII above).
4.3 The distinguishing feature provides further options for encoding bi-directionally predicted frames, which increases flexibility at the encoder with possible gains in compression efficiency. The board therefore also agrees with the examining division's finding that the objective technical problem to be solved by the present invention is increasing the flexibility of coding.
4.4 In the decision under appeal, documents D2 and D4 were combined with documents D3 and/or D5.
4.4.1 D5 discloses a pixel value interpolation based on multiple reference frames (MH-pictures). D5, chapter 2.2, refers to "interpolative motion compensation", according to which a motion vector (see mv2 of D5, Figure 2, which is reproduced below) to reference frames other than the nearest one is not coded but instead is derived by scaling (extrapolating) the motion vector to the nearest reference frame (mv1) and adding a differential motion vector (dmv). Pixel values of the predicted area are interpolated using equal weighting factors for the (two) reference frame pixel areas or a weighting adapted to video sequences with fading (see D5, chapters 2.2 and 4).
FORMULA/TABLE/GRAPHIC
Hence, D5 is similar to the present application in that it involves two reference frames which precede the present frame in display order and in that it scales a motion vector by a frame scale fraction (see the distinguishing feature).
However, according to D5 a motion vector from the present frame to the nearest reference frame is extrapolated to a further reference frame, which is prior to the first reference frame in display order. In contrast, according to claim 1 of the present main request, a motion vector between two previous reference frames is extrapolated to the present frame (see point 2.2 above). By using the present frame as an end point of a motion vector, D5 teaches away from direct mode prediction.
4.4.2 D3 refers to D2 ("Q15-K44") and D5 ("JVT-B075") and proposes an improvement of the pixel value interpolation of D5 such that "the temporal order of the prediction frames ... is not restricted at all" (see page 3, last paragraph). On the basis of the weighting of pixel values for fades in D5, a formula for the improved pixel value interpolation is derived which applies to "conventional B-picture coding order" and for an unrestricted order of reference frames.
D3 does not refer to direct mode prediction. In addition, it is concerned with pixel value interpolation, but not with motion vector interpolation. In particular, equations (3) to (7) of D3, which were cited in the decision under appeal, pertain to pixel value interpolation, i.e. a pixel value P is determined as a blend of several reference frame pixel values. The evaluation of D5 in D3 focuses solely on pixel value interpolation based on macroblocks designated by two motion vectors (see D3, points 2.2.1 and 3, first paragraph), whereas the motion vector extrapolation of D5 is not considered in D3. In addition, as has been discussed under point 4.4.1 above, the extrapolation in D5 is different from the extrapolation in claim 1. Hence, even if D3 was considered to include the motion vector extrapolation of D5, the skilled person would not construe document D3 as suggesting an extrapolation of a motion vector between two previous reference frames to the predicted frame. Thus the combination of D2 with D3 does not result in direct mode frame prediction involving two reference frames previous in display order to the present frame.
4.4.3 Essentially, the invention's contribution to the technical field is considered to be the realisation that a direct mode prediction of a B-frame can be based on two reference frames which are both prior in decoding order to that B-frame. This concept is not rendered obvious by the combination of documents D2 to D5, even though equation (6) of D3 applies to an arbitrary order of frames as argued by the examining division. Equation (6) applies to pixel interpolation and not motion vector interpolation, and more importantly, there is nothing in the available prior art to indicate that the skilled person would have considered extrapolating a motion vector in direct mode.
4.5 As a consequence, the subject-matter of claim 1 according to the main request involves an inventive step in view of documents D2 and D4 in combination with documents D3 and/or D5. Moreover, the board cannot see any other document or combination of documents on file by which the skilled person would have arrived at the claimed subject-matter.
4.6 It follows that the subject-matter of claim 1 and the further independent claims 2 to 4, which are restricted by features corresponding to those of claim 1, involves an inventive step (Article 56 EPC).
Amended description
5. The description has been amended in line with the claims of the main request and complies with the EPC.
Conclusion
6. In view of the above, the present case is to be remitted to the examining division with the order to grant a patent on the basis of the appellant's main request.
Order
For these reasons it is decided that:
1. The decision under appeal is set aside.
2. The case is remitted to the examining division with the order to grant a patent in the following version:
Description:
- pages 1 to 3, 6 to 9, 11, 12, 14, 15, 17, 18, 20, 23 to 28, 30 to 40 as originally filed
- pages 4, 10, 13, 16, 19, 21, 22 and 29 filed in electronic form on 17 April 2013
- pages 4a and 5 filed by letter of 12 February 2019
Claims:
Nos. 1 to 4 filed by letter of 12 February 2019
Drawings:
Sheets 1/15 to 15/15 as originally filed.