Convolutional neural networks (CNN) training often necessitates a considerable amountrnof computational resources. In recent years, several studies have proposed CNN inferencernand training accelerators, which the FPGAs have previously demonstrated goodrnperformance and energy efficiency. To speed processing, the CNN requires additionalrncomputational resources such as memory bandwidth, a FPGA plantform resource usage,rntime, and power consumption. As well as training the CNN needs large datasetsrnand computational power, and they are constrained by the requirement for improvedrnhardware acceleration to support scalability beyond existing data and model sizes. Inrnthis study, we propose a procedure for energy efficient CNN training in collaborationrnwith an FPGA-based accelerator. We employed optimizations such as quantization,rnwhich is a common model compression technique, to speed up the CNN training process.rnAdditionally, a gradient accumulation buffer is used to ensure maximum operatingrnefficiency while maintaining gradient descent of the learning algorithm.rnSubsequently, to validate our design, we implemented the AlexNet and VGG16rnmodels on an FPGA board and a laptop CPU and GPU. Consequently, our designsrnachieve 203.75 GOPS on Terasic DE1-SoC with the AlexNet model and 196.50 GOPSrnwith the VGG16 model on Terasic DE-SoC. This, as far as we know, outperforms existingrnFPGA-based accelerators. Compared to the CPU and GPU, our design is 22.613Xrnand 3.709X more energy efficient respectively.