European Case Law Identifier: | ECLI:EP:BA:2024:T199822.20241220 | ||||||||
---|---|---|---|---|---|---|---|---|---|
Date of decision: | 20 December 2024 | ||||||||
Case number: | T 1998/22 | ||||||||
Application number: | 16826643.5 | ||||||||
IPC class: | G06N 3/04 | ||||||||
Language of proceedings: | EN | ||||||||
Distribution: | D | ||||||||
Download and more information: |
|
||||||||
Title of application: | WIDE AND DEEP MACHINE LEARNING MODELS | ||||||||
Applicant name: | Google LLC | ||||||||
Opponent name: | - | ||||||||
Board: | 3.5.06 | ||||||||
Headnote: | - | ||||||||
Relevant legal provisions: |
|
||||||||
Keywords: | Clarity - main request (no) Inventive step - all requests (no) |
||||||||
Catchwords: |
- |
||||||||
Cited decisions: |
|
||||||||
Citing decisions: |
|
Summary of Facts and Submissions
I. The appeal is against the decision of the examining division to refuse the European patent application no.16826643.5 on the basis that the main request and the first and second auxiliary requests then on file did not meet the requirements of Articles 84 and 56 EPC. Reference was made inter alia to
D1: Shuang Yang et al., Neural network ensembles:
combining multiple models for enhanced
performance using a multistage approach, Expert
Systems, vol. 21, no. 5, November 2004, pages
279-288, XP055353883.
II. With the statement of grounds of appeal, the appellant requested that the decision of the examining division be set aside and a patent be granted on the basis of the main request or, alternatively, the first or second auxiliary request that were filed with the statement of grounds of appeal. As a precaution, oral proceedings were requested.
III. Following a summons to oral proceedings, the board presented its preliminary opinion in a communication pursuant to Article 15(1) RPBA. Claim 1 of the main request appeared to be not clear, Article 84 EPC, and claim 1 of all requests did not appear to make any technical contribution over a conventional general-purpose computer and did therefore lack an inventive step, Article 52(1) and 56 EPC.
IV. On 6 September 20224, the appellant indicated that it would not attend the oral proceedings and withdrew its request for oral proceedings. The appellant did not comment on the board's preliminary opinion.
V. The board thereupon cancelled the oral proceedings.
VI. Claim 1 of the main request reads as follows:
"A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a combined machine learning model (102) for processing a machine learning input comprising a plurality of features (108-122) to generate a predicted output (136) for the machine learning input, the combined machine learning model comprising:
a deep machine learning model (104) configured to process the features to generate a deep model intermediate predicted output;
a wide machine learning model (106) configured to process the features to generate a wide model intermediate predicted output; and
a combining layer (134) configured to process the deep model intermediate predicted output generated by the deep machine learning model and the wide model intermediate predicted output generated by the wide machine learning model to generate the predicted output,
wherein the deep machine learning model and the wide machine learning model have been trained jointly on training data to generate the deep model intermediate predicted output and the wide model intermediate predicted output by backpropagating a gradient determined from an error between a predicted output for a training input and the known output for the training input through the combining layer to the wide machine learning model and the deep machine learning model to jointly adjust the current values of the parameters of the deep machine learning model and the wide machine learning model."
VII. Claim 1 of auxiliary request 1 differs from claim 1 of the main request only in that it comprises the additional feature
"wherein the wide machine learning model (106) is a generalized linear model (132)".
VIII. Claim 1 of auxiliary request 2 reads as follows:
"A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a combined machine learning model (102) for processing a machine learning input comprising a plurality of input features (108-122) to generate a predicted output (136) for the machine learning input, the combined machine learning model comprising:
a deep machine learning model (104) configured to process a first set of features included in the machine learning input to generate a deep model intermediate predicted output;
a wide machine learning model (106) configured to (i) apply a cross-product feature transformation to a subset of a second set of features included in the machine learning input to generate transformed features and (ii) to process, using a generalized linear model, the transformed features and the input features in the second set to generate a wide model intermediate predicted output; and
a combining layer (134) configured to process the deep model intermediate predicted output generated by the deep machine learning model and the wide model intermediate predicted output generated by the wide machine learning model to generate the predicted output,
wherein the deep machine learning model and the wide machine learning model have been trained jointly on training data to generate the deep model intermediate predicted output and the wide model intermediate predicted output by backpropagating a gradient determined from an error between a predicted output for a training input and the known output for the training input through the combining layer to the wide machine learning model and the deep machine learning model to jointly adjust the current values of the parameters of the deep machine learning model and the wide machine learning model."
Reasons for the Decision
The application
1. The application relates to machine learning. It is proposed to combine a "deep machine learning model" and a "wide machine learning model" with a "combining layer", so as to obtain, for a set of "features" provided as input, a "predicted output" from the outputs of the two models (paragraphs [2]-[5]). See figure 1:
FORMULA/TABLE/GRAPHIC
2. The "deep machine learning model" may include "a deep neural network 130" and an "embedding layer 150" (paragraph [34]).
The "wide machine learning model" may include "a wide and shallow model, e.g. a generalized linear model 138" and a "cross-product feature transformation 132".
3. According to the description, "in general, a wide machine learning model can memorize feature inter actions through a wide set of cross-product feature transformations and a deep machine learning model can generalize unseen feature combinations by applying embedding functions to the input features", and "by including both deep machine learning model and wide machine learning model, the wide and deep machine learning model can obtain both benefits of memorization and generalization and thus can perform better on predicting an output from a set of input features" (paragraph [13]).
4. In an embodiment, to which the independent claims of the present requests are limited, the deep and the wide models are (or have been) trained "jointly" based on training data using backpropagation (paragraphs [54]-[62]).
5. In a first application example, the input may be a sequence of words, the features may be tokens representing the words in the sequence and the predicted output may be "a likelihood that a particular word is the next word in the sequence or a prediction for a part of speech or a word sense for a particular word in the sequence" (paragraph [20]).
In a second application example, the input may be "features of a content presentation setting" and the output "a score that represents a likelihood that a particular objective will be satisfied if the content item is presented in the content presentation setting". This may be, for instance, presenting in a web site a product to a user that the user is likely to purchase based on user features, or recommending an app in an online app store (paragraphs [21]-[33]).
All requests - Admittance
6. The main request and the first and second auxiliary requests were first filed with the statement of grounds of appeal. Their claim sets differ from the claim sets of the main, first and second auxiliary requests that underlie the contested decision only in that the expression "for instance" has been deleted in claim 9 of all requests.
This amendment addresses one of the objections under Article 84 EPC raised in the contested decision (point 12.2), does not introduce any new issue and does also not alter the subject-matter that is otherwise to be considered.
The board therefore exercises its discretion under Article 12(4) RPBA by admitting the main, first and second auxiliary requests filed with the statement of grounds of appeal.
Main request - Article 84 EPC
7. "Wide machine learning model"
7.1 The examining division objected under Article 84 EPC that the expression "a wide machine learning model" - used e.g. in claim 1 - was "vague and unclear and [left] the skilled person in doubt as to which techni cal features it refer[red] to" (decision, point 12.1).
7.2 The appellant argued that the skilled person would be able to attribute a technical meaning to that expression at least in contrast to the expression "deep machine learning model" that is also used in claim 1. Support in the description for the expression "wide machine learning model" was to be found in paragraphs [13], [37], [38] and [52] (statement of grounds of appeal, page 3).
7.3 The board agrees with the examining division that claim 1 is unclear, Article 84 EPC, because of the expression "wide machine learning model", which does not appear to have an established meaning in the art.
The appellant's argument is not convincing as the term "wide" is not the opposite of the term "deep" and there is thus no reason to interpret "wide machine learning model" as any machine learning model that is not a "deep machine learning model" (as apparently suggested by the appellant).
Such an understanding of "wide machine learning model" would also not be consistent with the description, in particular the passages cited by the appellant itself. While no general definition is provided for that expression, it is said in paragraph [37] that a wide machine learning model is a "wide and shallow model". Hence, being "not deep" ("shallow") is not sufficient to be a "wide machine learning model". This is also evident when looking at the advantages considered to be associated with the use of a "wide machine learning model" in paragraph [13]: it is not plausible that merely being not "deep" would be sufficient for any machine learning model to have them. The only concrete class of machine learning models disclosed is that of "generalized linear models" (paragraph [37]). It is not clear which other types of models are meant to be encompassed by the expression "wide machine learning model" in claim 1, and which ones are not encompassed by it (e.g. graphical models?). The scope of claim 1 is thus not clear, Article 84 EPC. The same objection applies to the other independent claims.
8. The objection of the examining division against claim 9 concerning the expression "for instance" (decision, point 12.2) has become moot due to the amendment made to that claim in all requests.
Main request - Inventive step
9. The examining division considered that all claims lacked an inventive step, Article 56 EPC, as they failed to solve a technical problem over the disclosure of D1, the distinguishing features of claim 1 over D1 only providing a solution to the non-technical problem of "how to improve/modify the mathematical model used in D1" (decision, point 13).
10. The appellant did not disagree with the differentiating features of claim 1 over D1 identified by the examining division but considered, based on paragraph [13] of the description, that they "allow a reduction of the amount of memory needed and produce an efficient method of memorization" and solve the objective technical problem of how to achieve this technical effect (statement of grounds of appeal, pages 4 and 5).
Following T 697/17, points 5.2.3 and 5.2.4, and T 817/16, point 3.12, determining technicality might be performed by enquiring whether the non-technical features would have been formulated by a technical expert rather than a non-technical expert. In the present case, "it must be a 'technical expert' since neither 'a programmer as such' who is able to implement a non-technical requirements specification on a computer nor a mathematician would have known how to deal with the 'generalization' aspect in the problem and hence none of them would have come up with the distinguishing features of claim 1". Hence, the distinguishing features would have been formulated by a technical expert and must be considered in the assessment of inventive step (statement of grounds of appeal, pages 6-8).
Furthermore, the present invention might be considered to be a "technical implementation" in the meaning of the EPO Guidelines G-II, 3.3, since it affected the input sample storage architecture of the overall architecture, as derivable from paragraph [13] (statement of grounds of appeal, pages 8 and 9).
Additional advantages of the invention were that it was able to deal with mixed kinds of data (paragraphs [35]-[38] and [46]-[48]) and that it could be implemented as a distributed system, e.g. in various cloud services (statement of grounds of appeal, page 9).
11. The board considers that the system of claim 1 does not make any technical contribution over a conventional general-purpose computer and therefore lacks an inventive step, Article 52(1) and 56 EPC, already for this reason.
The board notes that this line of argument was already developed in the WOISA (point 2.1) in parallel to the objection starting from D1, the key issue (technicality) being essentially the same.
11.1 The system of claim 1 differs from a conventional general-purpose computer only in the "instructions" stored on it, which amount to a computer program in the meaning of Article 52(2)(c) EPC. The question is whether this computer program contributes to a technical effect.
11.2 The method realised by this computer program takes an abstract input ("machine learning input comprising a plurality of features") and produces an abstract output ("a predicted output for the machine learning input"), neither of which has an inherent technical character.
The steps of that method involve the processing of abstract data by a "deep" and a "wide" machine learning model followed by a "combining layer". Like a deep neural network and a generalized linear model, these are abstract computational models of a mathematical nature with no inherent technical character. That the two models have been jointly trained based on training data based on backpropagation does not impart any technical character on them. It is in particular not derivable from the claim that any of these models - nor their combination - has been trained to perform a particular technical function.
It is noted that even the two application examples disclosed in the description and mentioned at point 8 above (word prediction and product/app recommendation) do not appear to be technical applications.
11.3 The board does not follow the appellant's argument which essentially amounts to consider the invention as an efficient storage method.
The combination of the deep and wide models applied to an input does not result in any particular form of storage of that input, but in a prediction (e.g. a product recommendation). Nor does any of the trained wide models represent a compressed encoding of the corresponding training data. It is not envisaged that the training data can be in any way reconstructed from the trained models. The trained models are only predictive models derived from the training data.
The board understands the notion of "memorization" used in paragraph [13] in relation to wide models (involving cross-product feature transformations) as an abstract one, relating to learning and to taking into account in the prediction co-occurrences in the historical data ("feature interactions").
11.4 As regards a technical implementation, claim 1 does not go beyond an implementation of the method by means of corresponding "instructions", i.e. as a computer program. It does not involve a distributed computing environment or cloud services, and so the corresponding argument of the appellant is not relevant.
11.5 The alleged fact that the the invention would be able to deal with mixed kinds of data, e.g. numerical and categorical data, does not concern a technical property and would thus not imply a technical contribution, even if it were derivable from claim 1 (it is not).
11.6 As to the argument that only a "technical expert" could have devised the features of the invention, the board notes, as a general word of caution, that this kind of enquiry may be helpful in some cases to separate non-technical features from technical features - in particular to identify business-related features - but does not constitute a definite test as it only concerns which kind of considerations underlie some features of the invention and not which kind of effects are achieved by it. For instance, a claim to a computer-implemented simulation may involve features which are based on expertise in the technical field of the technical system that is being simulated. This alone would however not be sufficient to conclude that these features contribute to the technical character of the claim (G 1/19, reasons 122, 125, 141 and 142).
In any case, the board tends to consider that claim 1 does not reflect any considerations beyond computer programming (which encompasses the design of algorithms) and mathematics.
12. Additionally, the board notes that it does not find fault in the objection raised by the examining division starting from D1. The appellant has not objected to the differentiating features identified in point 13.1.1 of the decision, and they are not considered to solve any technical problem over D1 for the same reasons as those given above.
Auxiliary request 1
13. Claim 1 of auxiliary request 1 differs from claim 1 of the main request only in that it comprises the additional feature "wherein the wide machine learning model (106) is a generalized linear model (132)".
14. Even if the amendment might be accepted as overcoming the objection under Article 84 EPC raised at point 7.3 above, the objection under Article 56 EPC still applies, a generalized linear model being an abstract model with no inherent technical character.
Auxiliary request 2
15. Claim 1 of auxiliary request 2 differs from claim 1 of the main request in the the following amended features:
- "a machine learning input comprising a plurality of input features";
- "a deep machine learning model (104) configured to process a first set of features included in the machine learning input [deleted: the features] to generate a deep model intermediate predicted output";
- "a wide machine learning model (106) configured to (i) apply a cross-product feature transformation to a subset of a second set of features included in the machine learning input to generate transformed features and (ii) to process, using a generalized linear model, the transformed features and the input features in the second set [deleted: the features] to generate a wide model intermediate predicted output".
16. Even if the amendments might be accepted as overcoming the objection under Article 84 EPC raised at point 7.3 above, the objection under Article 56 EPC still applies, the amendments only providing further mathematical details of the abstract model with no inherent technical character.
It is noted that the use of a "cross-product feature transformation" in the "wide machine learning model" may contribute to establishing that the advantages recited in paragraph [13] are actually achieved, as emphasised by the appellant (statement of grounds appeal, pages 10 and 11), but these advantages remain of a non-technical nature.
Order
For these reasons it is decided that:
The appeal is dismissed.