Synthetic minority oversampling technique - Wikipedia - Recent changes [en]

3 hours ago 110

Clarify sample oversampling, and minor copy editing

← Previous revision Revision as of 18:15, 16 July 2025
Line 1: Line 1:
{{Short description|Statistical oversampling method}}
{{Short description|Statistical oversampling method}}
In statistics, '''synthetic minority oversampling technique (SMOTE)''' is a method for oversampling when dealing with imbalanced classification categories within a dataset. Compared with the method of undersampling, which also is used for imbalanced datasets, SMOTE will oversample the minority category.<ref>{{Citation |last=Chawla |first=N. V. |title=SMOTE: Synthetic Minority Over-sampling Technique |date=2011-06-09 |url=http://arxiv.org/abs/1106.1813 |access-date=2025-07-16 |publisher=arXiv |doi=10.48550/arXiv.1106.1813 |id=arXiv:1106.1813 |last2=Bowyer |first2=K. W. |last3=Hall |first3=L. O. |last4=Kegelmeyer |first4=W. P.}}</ref><ref name=":0">{{Cite journal |last=Chawla |first=N. V. |last2=Bowyer |first2=K. W. |last3=Hall |first3=L. O. |last4=Kegelmeyer |first4=W. P. |date=2002-06-01 |title=SMOTE: Synthetic Minority Over-sampling Technique |url=https://www.jair.org/index.php/jair/article/view/10302 |journal=Journal of Artificial Intelligence Research |language=en |volume=16 |pages=321–357 |doi=10.1613/jair.953 |issn=1076-9757}}</ref>
In statistics, '''synthetic minority oversampling technique (SMOTE)''' is a method for oversampling the minority class of samples when dealing with imbalanced classification categories within a dataset. Compared with the method of undersampling, which also is used for imbalanced datasets, SMOTE will oversample the minority category.<ref>{{Citation |last=Chawla |first=N. V. |title=SMOTE: Synthetic Minority Over-sampling Technique |date=2011-06-09 |url=http://arxiv.org/abs/1106.1813 |access-date=2025-07-16 |publisher=arXiv |doi=10.48550/arXiv.1106.1813 |id=arXiv:1106.1813 |last2=Bowyer |first2=K. W. |last3=Hall |first3=L. O. |last4=Kegelmeyer |first4=W. P.}}</ref><ref name=":0">{{Cite journal |last=Chawla |first=N. V. |last2=Bowyer |first2=K. W. |last3=Hall |first3=L. O. |last4=Kegelmeyer |first4=W. P. |date=2002-06-01 |title=SMOTE: Synthetic Minority Over-sampling Technique |url=https://www.jair.org/index.php/jair/article/view/10302 |journal=Journal of Artificial Intelligence Research |language=en |volume=16 |pages=321–357 |doi=10.1613/jair.953 |issn=1076-9757}}</ref>


== Algorithm ==
== Algorithm ==
Line 45: Line 45:
* <code>Populate()</code> is the generating function for new synthetic minority samples
* <code>Populate()</code> is the generating function for new synthetic minority samples


If N is less than 100%, the minority class samples will be randomized as only a random subset of them will have SMOTE applied to them. (∗ Compute k nearest neighbors for each minority class sample only. ∗)
If N is less than 100%, the minority class samples will be randomized, as only a random subset of them will have SMOTE applied to them.


== Variations ==
== Variations ==
Open Full Post