In this paper, a reinforcement learning method is used to tune a non-linear reactive control model parameters of a two-body point absorber Ocean Wave Energy Converter (OWEC). In particular, an Actor-Critic algorithm, as a model-free method is adopted for the maximization of the energy extraction, adaptive to the sea state. Different values of Power Take-Off (PTO) control parameters are applied to the system to observe reward and penalty of the taken action. Reward is determined by the average power over a specific time horizon lasting several wave periods. A two-body point absorber, simulated in WEC Sim, is developed as the agent in order to validate the control strategy for different wave conditions. Results for the analyzed sea states verifies that the proposed non-linear control law learns the optimal PTO control parameters in specified sea states.