DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control

1IIIT Hyderabad, 2Brown University, 3University of Tartu, Estonia
Equal contribution

Abstract

Dual-arm manipulation is an area of growing interest in the robotics community. Enabling robots to perform tasks that require the coordinated use of two arms, is essential for complex manipulation tasks such as handling large objects, assembling components, and performing human-like interactions. However, achieving effective dual-arm manipulation is challenging due to the need for precise coordination, dynamic adaptability, and the ability to manage interaction forces between the arms and the objects being manipulated. We propose a novel pipeline that combines the advantages of policy learning based on environment feedback and gradient-based optimization to learn controller gains required for the control outputs. This allows the robotic system to dynamically modulate its impedance in response to task demands, ensuring stability and dexterity in dual-arm operations. We evaluate our pipeline on a trajectory-tracking task involving a variety of large, complex objects with different masses and geometries. The performance is then compared to three other established methods for controlling dual-arm robots, demonstrating superior results.

Architecture

Overview of the proposed method: (a) The pipeline shows how the policy network uses observations \(O_{t}\), reward \(R_{t}\), time and mass embeddings to predict stiffness \(K_{t}\). This stifness, along with state variables from MuJoCo and reference trajectory, is fed to the QP solver, which outputs joint accelerations \(\mathbf{\ddot{q}^{\star}}_{L}\) and \(\mathbf{\ddot{q}^{\star}}_{R}\). These accelerations are then converted to torques \(\mathbf{\tau}_{L}\) and \(\mathbf{\tau}_{R}\) and applied to the MuJoCo simulator. (b) It illustrated the QP solver implemented using CVXPY, which computes the impedance and postural errors using the provided stiffness and reference trajectory and solves the QP problem to determine the joint accelerations. (c) It depicts the trajectory generation process, which uses the quintic polynomial for cartesian positions and SLERP for \(SO(3) \) orientations. The trajectory includes a waypoint where the \(x\) and \(y\) coordinatges are the averages of the initial and goal positions, and the \(z\) coordinate is set to a random height within the workspace of the arms to introduce variability.

Video Explanation

More Results

BibTeX

@misc{davil2024,
    title={DA-VIL: Adaptive Dual-Arm Manipulation with Reinforcement Learning and Variable Impedance Control}
    author={Md Faizal Karim and Shreya Bollimuntha and Mohammed Saad Hashmi and Autrio Das and Gaurav Singh and Srinath Sridhar Arun Kumar Singh and Nagamanikandan Govindan and K Madhava Krishna},
    year={2024},
    journal={}
    
}