Real-time control of battery power is a dynamic process where input variables — such as battery health, state of health (SOC), and other specifications — vary continuously. This article explains how the reinforcement learning (RL) algorithm can be used in the EV ecosystem and includes a case study on improving EV battery charging.
RL is a machine learning training method that rewards desired behaviors and punishes undesired ones. It enables an agent to perceive and interpret its environment, take action, and learn through trial and error.
Figure 1 shows the cyclic behavior of an RL algorithm. In this algorithm, the agent performs the desired action based on the present state and input from the environment. The agent also considers the desired reward and decides on the next state accordingly.
The “next state” now acts as feedback in the closed-loop system. The environment refers to everything outside the agent that the agent interacts with. The environment provides the agent with observations about its current state and rewards based on the agent’s actions.

Figure 1. An illustration of the iterative process in RL. (Image: Rakesh Kumar, Ph.D.)
How can RL be applied in EVs?
RL is a concept that can be applied in various ways in the EV ecosystem. Figure 2 represents the different examples of representing the “States,” “Rewards,” “Constraints,” and “RL algorithms.” The “States” can be EV battery specifications such as SOC, charging efficiency, and SOH.
“Rewards” can be improved battery life, lower charging time, and lower operating costs. The algorithm can be placed under “Constraints,” where certain boundary conditions can be put forward. RL algorithms have variations, such as Q-learning, SARSA, and w-learning.

Figure 2. Various ways of implementing RL in EV. (Image: IEEE Access)
RL is considered a model-free algorithm. The algorithm is dynamic in nature as it works on repeated iterations through the “next state” as feedback. As an online learning method, RL consumes more and more data to stay updated with the current requirements. Therefore, it’s suitable for EV charging applications where the data continuously changes, and the rewards can also change occasionally.
Case study
A research study was carried out in collaboration with various universities in the Republic of Korea. RL, specifically a deep Q-network (DQN), was used to optimize the driving profile of EVs. The study’s objective was to increase the battery life of EVs, considering the vehicles ahead when driving.
As with any RL method, three factors were taken as input for this research study:
1. The current speed of the vehicle
2. The relative speed between the two vehicles
3. The distance between the test vehicle and the vehicle ahead.
The agent’s action is represented by acceleration. The agent can accelerate or decelerate the vehicle depending on the input factors. The reward functions optimize vehicle energy efficiency, battery life, and distance between vehicles.
Figure 3 shows the research results, where Federal Test Procedure-75 (FTP-75) was taken as the standard for driving cycle emissions and fuel economy testing. The cases in FTP-75 refer to different sections of the FTP-75 driving cycle denoted by a time duration. Here, the duration of the study was 120 s.

Figure 3. Energy efficiency and battery capacity loss simulations of the two models. (Image: Hindawi)
The DQN-based A model represents the use of RL compared to the A model. It can be observed from the bar chart that the energy efficiency of the EV battery has increased, barring the second case. Case 3 has seen the most significant improvement of 15.39%
Except for case 2, the use of RL has helped reduce EV battery capacity loss. The most improved scenario was Case 3, where RL helped reduce battery capacity loss by 29.14%.
Summary
The significance of RL in EV battery charging is underscored by its model-free approach, which effectively accounts for uncertainties. The algorithm’s unique feature of promoting rewards for continuous improvement reassures us of its ability to adapt and steer the learning system toward more favorable outcomes.
However, RL is not restricted only to a single EV charging, as shown in the case study. The algorithm has a much bigger scope in EV infrastructure when considering Vehicle-to-Grid integration. As such, there are plenty of reward opportunities, but there are also a lot of constraints when designing RL for EVs with an electric grid.
References
- Driving Profile Optimization Using a Deep Q-Network to Enhance Electric Vehicle Battery Life, Hindawi
- Reinforcement learning for electric vehicle applications in power systems: A critical review, Elsevier
- Reinforcement Learning Based EV Charging Management Systems–A Review, IEEE Access
Images
Filed Under: FAQs, Testing and Safety