Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Mitigating carbon footprint for knowledge distillation based deep learning model compression

View through CrossRef
Deep learning techniques have recently demonstrated remarkable success in numerous domains. Typically, the success of these deep learning models is measured in terms of performance metrics such as accuracy and mean average precision (mAP). Generally, a model’s high performance is highly valued, but it frequently comes at the expense of substantial energy costs and carbon footprint emissions during the model building step. Massive emission of CO2 has a deleterious impact on life on earth in general and is a serious ethical concern that is largely ignored in deep learning research. In this article, we mainly focus on environmental costs and the means of mitigating carbon footprints in deep learning models, with a particular focus on models created using knowledge distillation (KD). Deep learning models typically contain a large number of parameters, resulting in a ‘heavy’ model. A heavy model scores high on performance metrics but is incompatible with mobile and edge computing devices. Model compression techniques such as knowledge distillation enable the creation of lightweight, deployable models for these low-resource devices. KD generates lighter models and typically performs with slightly less accuracy than the heavier teacher model (model accuracy by the teacher model on CIFAR 10, CIFAR 100, and TinyImageNet is 95.04%, 76.03%, and 63.39%; model accuracy by KD is 91.78%, 69.7%, and 60.49%). Although the distillation process makes models deployable on low-resource devices, they were found to consume an exorbitant amount of energy and have a substantial carbon footprint (15.8, 17.9, and 13.5 times more carbon compared to the corresponding teacher model). The enormous environmental cost is primarily attributable to the tuning of the hyperparameter, Temperature (τ). In this article, we propose measuring the environmental costs of deep learning work (in terms of GFLOPS in millions, energy consumption in kWh, and CO2 equivalent in grams). In order to create lightweight models with low environmental costs, we propose a straightforward yet effective method for selecting a hyperparameter (τ) using a stochastic approach for each training batch fed into the models. We applied knowledge distillation (including its data-free variant) to problems involving image classification and object detection. To evaluate the robustness of our method, we ran experiments on various datasets (CIFAR 10, CIFAR 100, Tiny ImageNet, and PASCAL VOC) and models (ResNet18, MobileNetV2, Wrn-40-2). Our novel approach reduces the environmental costs by a large margin by eliminating the requirement of expensive hyperparameter tuning without sacrificing performance. Empirical results on the CIFAR 10 dataset show that the stochastic technique achieves an accuracy of 91.67%, whereas tuning achieves an accuracy of 91.78%—however, the stochastic approach reduces the energy consumption and CO2 equivalent each by a factor of 19. Similar results have been obtained with CIFAR 100 and TinyImageNet dataset. This pattern is also observed in object detection classification on the PASCAL VOC dataset, where the tuning technique performs similarly to the stochastic technique, with a difference of 0.03% mAP favoring the stochastic technique while reducing the energy consumptions and CO2 emission each by a factor of 18.5.
Title: Mitigating carbon footprint for knowledge distillation based deep learning model compression
Description:
Deep learning techniques have recently demonstrated remarkable success in numerous domains.
Typically, the success of these deep learning models is measured in terms of performance metrics such as accuracy and mean average precision (mAP).
Generally, a model’s high performance is highly valued, but it frequently comes at the expense of substantial energy costs and carbon footprint emissions during the model building step.
Massive emission of CO2 has a deleterious impact on life on earth in general and is a serious ethical concern that is largely ignored in deep learning research.
In this article, we mainly focus on environmental costs and the means of mitigating carbon footprints in deep learning models, with a particular focus on models created using knowledge distillation (KD).
Deep learning models typically contain a large number of parameters, resulting in a ‘heavy’ model.
A heavy model scores high on performance metrics but is incompatible with mobile and edge computing devices.
Model compression techniques such as knowledge distillation enable the creation of lightweight, deployable models for these low-resource devices.
KD generates lighter models and typically performs with slightly less accuracy than the heavier teacher model (model accuracy by the teacher model on CIFAR 10, CIFAR 100, and TinyImageNet is 95.
04%, 76.
03%, and 63.
39%; model accuracy by KD is 91.
78%, 69.
7%, and 60.
49%).
Although the distillation process makes models deployable on low-resource devices, they were found to consume an exorbitant amount of energy and have a substantial carbon footprint (15.
8, 17.
9, and 13.
5 times more carbon compared to the corresponding teacher model).
The enormous environmental cost is primarily attributable to the tuning of the hyperparameter, Temperature (τ).
In this article, we propose measuring the environmental costs of deep learning work (in terms of GFLOPS in millions, energy consumption in kWh, and CO2 equivalent in grams).
In order to create lightweight models with low environmental costs, we propose a straightforward yet effective method for selecting a hyperparameter (τ) using a stochastic approach for each training batch fed into the models.
We applied knowledge distillation (including its data-free variant) to problems involving image classification and object detection.
To evaluate the robustness of our method, we ran experiments on various datasets (CIFAR 10, CIFAR 100, Tiny ImageNet, and PASCAL VOC) and models (ResNet18, MobileNetV2, Wrn-40-2).
Our novel approach reduces the environmental costs by a large margin by eliminating the requirement of expensive hyperparameter tuning without sacrificing performance.
Empirical results on the CIFAR 10 dataset show that the stochastic technique achieves an accuracy of 91.
67%, whereas tuning achieves an accuracy of 91.
78%—however, the stochastic approach reduces the energy consumption and CO2 equivalent each by a factor of 19.
Similar results have been obtained with CIFAR 100 and TinyImageNet dataset.
This pattern is also observed in object detection classification on the PASCAL VOC dataset, where the tuning technique performs similarly to the stochastic technique, with a difference of 0.
03% mAP favoring the stochastic technique while reducing the energy consumptions and CO2 emission each by a factor of 18.
5.

Related Results

Introducing a new national tool to monitor the carbon footprint of inhalers
Introducing a new national tool to monitor the carbon footprint of inhalers
Abstract Introduction The Welsh Government launched its National Health Service (NHS) Wales decarbonisation strategic delivery p...
A Comprehensive Review of Distillation in the Pharmaceutical Industry
A Comprehensive Review of Distillation in the Pharmaceutical Industry
Distillation processes play a pivotal role in the pharmaceutical industry for the purification of active pharmaceutical ingredients (APIs), intermediates, and solvent recovery. Thi...
The carbon footprint cost of travel to Canadian Urological Association conferences
The carbon footprint cost of travel to Canadian Urological Association conferences
Introduction: Canadian Urological Association (CUA) conferences are held annually across Canada. Guests from across the world attended, contributing to the overall carbon footprint...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
Combined Knowledge Distillation Framework: Breaking Down Knowledge Barriers
Combined Knowledge Distillation Framework: Breaking Down Knowledge Barriers
<p>Knowledge distillation, one of the most prominent methods in model compression, has successfully balanced small model sizes and high performance. However, it has been obse...
Questions and Answers in the Negative Footprint Illusion Paradigm: A Reply to Gorissen et al. (2024)
Questions and Answers in the Negative Footprint Illusion Paradigm: A Reply to Gorissen et al. (2024)
When asked to estimate the carbon footprint of a bundle of relatively low carbon footprint items and relatively high carbon footprint items, people typically report a lower value c...
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...
Carbon footprint of Russia: realities and prospects of economic development
Carbon footprint of Russia: realities and prospects of economic development
The article deals with the key aspects of the problem of determining the “carbon footprint” of industrial production. Rapidly increasing greenhouse gas emission within the past two...

Back to Top