<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Multi-modal Interactive Perception in Human Control of Complex Objects</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/29/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10465773</idno>
					<idno type="doi">10.1109/ICRA48891.2023.10160375</idno>
					<title level='j'>IEEE International Conference on Robotics and Automation (ICRA 2023),</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Rashida Nayeem</author><author>Salah Bazzi</author><author>Mohsen Sadeghi</author><author>Reza Sharif Razavian</author><author>Dagmar Sternad</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Tactile sensing has been increasingly utilized in robotcontrol of unknown objects to infer physical properties andoptimize manipulation. However, there is limited understandingabout the contribution of different sensory modalities duringinteractive perception in complex interaction both in robotsand in humans. This study investigated the effect of visual andhaptic information on humans’ exploratory interactions witha ‘cup of coffee’, an object with nonlinear internal dynamics.Subjects were instructed to rhythmically transport a virtualcup with a rolling ball inside between two targets at a specifiedfrequency, using a robotic interface. The cup and targets weredisplayed on a screen, and force feedback from the cup-andballdynamics was provided via the robotic manipulandum.Subjects were encouraged to explore and prepare the dynamicsby “shaking” the cup-and-ball system to find the best initialconditions prior to the task. Two groups of subjects received thefull haptic feedback about the cup-and-ball movement duringthe task; however, for one group the ball movement was visuallyoccluded. Visual information about the ball movement had twodistinctive effects on the performance: it reduced preparationtime needed to understand the dynamics and, importantly, itled to simpler, more linear input-output interactions betweenhand and object. The results highlight how visual and haptic informationregarding nonlinear internal dynamics have distinctroles for the interactive perception of complex objects.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>When a child shakes a present before opening it on Christmas morning, they can quickly guess what they received. Humans exhibit exquisite skill in perceiving objects through exploratory interactions <ref type="bibr">[1]</ref>. This includes rattling boxes to gauge their contents, or squeezing fruits to feel their ripeness. This human ability of interactive perception, i.e., using forceful interactions with an object to gain information, has recently received substantial attention in the robotics community <ref type="bibr">[2]</ref>.</p><p>Interactive perception of non-rigid objects with internal degrees of freedom, such as sloshing liquids in containers, is of paramount interest to robotics <ref type="bibr">[3]</ref>- <ref type="bibr">[7]</ref>. Recent approaches in robotic manipulation have leveraged this interactive approach to both obtain information about the object and then to subsequently manipulate it <ref type="bibr">[8]</ref>- <ref type="bibr">[10]</ref>. However, visual information processing is extremely costly and the integration of different sensory information in robotic systems presents major computational challenges. Therefore, most control policies have relied exclusively on haptic or tactile signals to infer properties of the objects. For example, when grasping different rigid and non-rigid objects, tactile information was shown to enable successful manipulation <ref type="bibr">[8]</ref>. Yet, we conjecture, integrating multiple streams of information could potentially lead to more informed control schemes.</p><p>Advances in robotics have been inspired by human research showing that information obtained through exploratory actions improves manipulation strategies <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref>. Humans routinely integrate haptic, acoustic and visual information for successful manipulation but each of these information sources may have differing impacts on behavior.</p><p>Although it has been understood that humans are 'visiondominant' <ref type="bibr">[13]</ref>, studies on manipulation have emphasized the intricate interplay of haptic and visual information <ref type="bibr">[14]</ref>. However, these studies have focused on how humans reach to or handle rigid objects without complex internal dynamics. Only two previous studies on the manipulation of a linear mass spring examined the role of haptic information and reported that it is necessary for dexterous performance <ref type="bibr">[15]</ref>, <ref type="bibr">[16]</ref>. How humans use both visual and haptic information to explore and manipulate objects with nonlinear internal dynamics, e.g., a cup of coffee, is still unknown. As robots aim to dexterously manipulate complex objects, it is useful to understand how humans interactively perceive and utilize different information channels for manipulation.</p><p>This experimental study is the first to investigate how different sensory modalities affect humans' ability to gain information about an object with nonlinear internal dynamics through interactive perception. In previous research, Sternad and colleagues have examined human control of an object with nonlinear internal dynamics inspired by 'carrying a cup of coffee' <ref type="bibr">[17]</ref>- <ref type="bibr">[22]</ref>. Using a virtual environment, subjects interacted with a cup-and-ball system, visualized on a screen, via a robotic manipulandum that moved the cup and also provided haptic feedback about the internal ball forces back to the user's hand. The dynamics of the cup-and-ball can evolve into complex and potentially chaotic behavior. Studies have found that during interaction humans aim to make cupand-ball dynamics simpler, i.e., more predictable. A recent study by Nayeem et al. investigated how humans explored and prepared this system prior to a continuous rhythmic transport task <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>. Results showed that subjects interactively prepared the object for the upcoming task: by 'jiggling' the system back and forth, they learned which initial states resulted in shorter transients to reach a more predictable steady state faster.</p><p>Using the same experimental paradigm, this paper explored how visual and haptic information about the internal dynamics affected subjects' exploration strategies, i.e., their interactive perceptual strategies. Two experimental conditions were implemented in a virtual environment: the first condition presented full haptic and visual feedback; the second condition occluded visual information about the ball dynamics. Subjects' interactive perceptual actions were quantified by their ability to converge to a control strategy that minimized transients and reached a steady state faster. Results showed that without visual information about internal dynamics, subjects required more time for exploration and excited the system with a wider range of frequencies, yet were less likely to find the optimal solution. II. METHODS A. Experimental Task, Apparatus, and Data Acquisition Subjects interacted with a 'cup of coffee' simulated in a virtual environment. Simulating a 3D cup with sloshing coffee would be computationally expensive and was not a viable option for real-time virtual rendering. Therefore, the task was simplified to transporting a 2D semicircular cup moving on a horizontal line with a ball sliding inside the cup (Fig. <ref type="figure">1</ref>). Since the ball was sliding instead of rolling, the system was mechanically equivalent to a cart sliding on frictionless line with a suspended frictionless pendulum. The pendulum bob was represented by the ball, and the cup position corresponded to the cart position. The arc of the cup corresponded to the rotational path of the pendular bob, i.e., ball (Fig. <ref type="figure">1</ref>). While simplified, the task retained the basic challenges of transporting a cup of coffee: underactuation and nonlinear internal dynamics. The equations of motion for the system are:</p><p>F inter is the force applied by the human hand on the cup. X and &#952; are cup position and ball angle, respectively. The ball angle when at the bottom of the cup defined 0deg; clockwise direction was negative. F ball denotes the force that the ball exerts on the cup. Parameters used to simulate the system were: cup mass m c =2.4kg, ball mass m b =1.0kg, pendulum length l=0.45m, and gravitational acceleration g. The values were chosen to be heavy enough for the subjects to feel F ball upon their hand, but light enough to avoid fatigue. Subjects interacted with the virtual cup-and-ball via a robotic manipulandum capable of haptic force feedback (HapticMaster, Fig. <ref type="figure">1C</ref>) <ref type="bibr">[25]</ref>. The cup was shown as a yellow arc and a small white circle rolling inside represented the ball. The subject grasped the robot handle to control the displacement of the cup, X, shown on a screen 2m in front of them. The robot was admittance-controlled: the subject's force, F inter , on the manipulandum resulted in cup displacement, according to Eqs. <ref type="bibr">(1)</ref><ref type="bibr">(2)</ref>. A custom-written C++ program based on the HapticAPI computed the cup and ball kinematics that then controlled the cup and ball on the visual display. The ball force F ball was haptically fed back to the subject's hand at 120Hz update rate. Corresponding to the horizontal cup movement, the movement of the robot handle was also restricted to a horizontal line. Two blue rectangular target boxes delimited the cup's peak-to-peak amplitude for the instructed back-and-forth movements (Fig. <ref type="figure">1D</ref>). The cup's rim was at &#177;50deg; the ball could not 'escape' from the cup, but if it exceeded &#177;50deg, it would rotate above the cup rim.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Experimental Conditions and Task Instructions</head><p>Two groups of 9 healthy college-aged subjects each performed one of the following two conditions: in the Full Information condition subjects interacted with the simulated cup and ball, while receiving full visual and haptic feedback (Fig. <ref type="figure">1D</ref>). In the Hidden Dynamics condition a yellow rectangle covered the system. Subjects were unaware they were manipulating a cup with a ball rolling inside (Fig. <ref type="figure">1E</ref>). Only the F ball acting upon their hand via the manipulandum would provide haptic information to infer the object's dynamics.</p><p>At the start of each trial, the cup was positioned in Box A with the ball at rest (0deg). Subjects were instructed to move the cup in rhythmic fashion between Box A and Box B (0.3m apart) for 15s paced by an auditory metronome that was set to 0.60Hz (Fig. <ref type="figure">1C</ref>). Prior to starting the prescribed rhythmic movement, subjects were encouraged to explore and prepare the cup-and-ball dynamics by 'jiggling' the cup. This interactive preparation interval was not limited in time, but the cup motions were constrained to the left half of the screen (Fig. <ref type="figure">1D</ref>). They were told find a preparation strategy that would allow them to complete the prescribed rhythmic task to the best of their ability. Once subjects felt ready, they moved the cup towards Box B (0.15m) and continued moving rhythmically between the two boxes (Fig. <ref type="figure">2A</ref>) at a pace of 0.60Hz. The metronome began when the participant reached Box B for the first time. The experiment consisted of 120 trials for each condition, which lasted 40 minutes. All subjects gave written informed consent, as approved by the Institutional Review Board of Northeastern University. De-indentified data is publicly available online <ref type="bibr">[26]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Performance Metrics</head><p>The effects of interactive perception were first assessed by the initial ball states that participants adopted before starting the rhythmic task. The duration of the transients in each trial measured how quickly subjects reached a steady state, and therefore more predictable dynamics <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>. These two metrics served as an indicator of how well subjects had inferred the object's dynamics via preparatory activity. To further examine whether the two experimental conditions (Full Information and Hidden Dynamics) elicited different preparatory activity, the frequency components in the preparation interval were also analyzed. In addition, system identification was applied to characterize the activity serving interactive perception. All calculations were performed in Matlab (Mathworks, v.29b, Natick MA).</p><p>1) Initial Conditions: Initial conditions &#952; 0 and &#952;0 were determined at the instance when the subject started the cup movement towards Box B (0.15m), i.e., the final zero cup velocity before reaching Box B (Fig. <ref type="figure">2A</ref>). The states that defined the system's initial conditions were ball angle &#952; 0 and ball velocity &#952;0 ; cup position X 0 was not included in further analyses as it was the at the center of Box A. Movements before this time point were considered preparation.</p><p>2) Transient Duration: As the rhythmic cup movement began, the cup-and-ball system exhibited a transient prior to reaching a steady state. To calculate the duration of this transient, the steady state for the system had to be defined first. For rhythmic cup movements at the metronome frequency of 0.60Hz, the cup and ball position were in phase (Fig. <ref type="figure">2B</ref>). To compute the end of the transient and start of the steady state, the instantaneous phase of the cup and ball position were calculated using Hilbert transform <ref type="bibr">[27]</ref>. Relative phase between cup and ball, the difference between the two phase signals, served to indicate when the system entered a steady state. A relative phase less than 27deg (15% percent of &#177;180deg) for the rest of the trial marked the end of the transient and the start of steady state <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>. The time between the initial conditions and start of steady state defined the transient duration.</p><p>Given the known dynamics of the system, forward dynamic simulations were run to evaluate which initial conditions led to shorter transients. These simulations required an input force to the cup-and-ball system, i.e., a controller. As a simple choice, the control input was the desired rhythmic cup trajectory X des (t) = (A/2)sin(2&#960;f t + &#960;/2)) coupled to a hand impedance, i.e., a linear spring K in parallel with a damper B <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>, <ref type="bibr">[28]</ref>. The equations of motion of the coupled model include ( <ref type="formula">1</ref>)-( <ref type="formula">2</ref>) with F inter expressed as:</p><p>The hand impedance acted as a proportional derivative controller that reduced any divergence of X(t) from X des (t) due to ball forces F ball <ref type="bibr">[19]</ref>. Stiffness K and damping B were constants; their respective values were estimated from each experimental trial using an optimization method <ref type="bibr">[19]</ref>, <ref type="bibr">[23]</ref>, <ref type="bibr">[24]</ref>. For the forward simulations, the mean constant values were used: K= 40N/m and B= 20Ns/m. To evaluate the effect of initial ball states on the transient duration, the cup-and-ball system was forward-simulated for different &#952; 0 (&#177;90deg, 1deg step size) and &#952;0 (&#177;150deg, 1deg step size) to produce a heat map of transient durations.</p><p>3) Preparation Interval: Participants had no time limit for their preparatory interactions, hence the duration of this interval was also informative. The preparation interval was the time between the start of the trial to the point where initial conditions were determined. To characterize preparatory activity, cup position was parsed into cycles and their frequencies determined. All frequencies of a trial were pooled across subjects and binned into 0.02Hz bins. The frequencies were binned into 10 trial intervals and summarized in a time-frequency plot which showed the changes in preparation cup frequencies across practice.</p><p>System identification methods were used to characterize input-output behavior during preparation. Linearizing Eqs.1-2 around the pendulum's downward position, a 4 rd -order transfer function described the system dynamics with interaction force as an input and cup position as an output. Therefore, a 4 rd -order linear transfer function was fit between interaction force (input) and cup position (output), using tfest.m (Mathworks, v.29b; <ref type="bibr">[29]</ref>- <ref type="bibr">[31]</ref>). 30 trials were used as the system identification needed to be adequately trained. This tested if subjects linearized dynamic behavior during preparation. To check sensitivity to parameters, functions of 3 rd -and 5 th -order were also fitted. Specifically, each transfer function was fit between the timeseries of the interaction force and cup position for the first and last 30 trials of each individual to assess if there was a change in preparation activity with practice. The fitting error was calculated as the  mean squared error (MSE) between experimental output (cup position) and the model estimation for the corresponding input (interaction force).</p><p>4) Statistical Analyses: The performance metrics (initial conditions, transient duration, duration of the preparation interval) showed approximately exponential trends across trials. Therefore, the averaged data was fit with exponential functions, and the time constant &#964; indicated the rate of convergence to the final value; R 2 specified the goodness of fit. As initial ball velocity exhibited poor fits to exponential functions (R 2 &lt; 0.10), linear functions were used. For each metric, the difference between early (first 5 trials) and late (last 5 trials) performance within groups was compared using paired t-tests. Unpaired t-tests quantified the differences across groups. For all tests p &lt; 0.05 was considered significant. Finally, to identify differences in the quality of fit from the system identification procedure, a regression analysis (general linear model) was used with model order, early-late, feedback conditions and subject as fixed factors <ref type="bibr">[32]</ref>. All individual trials were fed into the regression model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. RESULTS</head><p>Subjects in both Full Information and Hidden Dynamics conditions performed the rhythmic task as instructed at the metronome frequency of 0.60Hz and with an amplitude conform with the distance between the target boxes (0.30m). There was no apparent change in amplitude or frequency over the 120 trials. The average frequencies and standard deviations across trials and subjects were 0.59&#177;0.001Hz and 0.60&#177;0.012Hz in the two conditions. The average movement amplitudes were: 0.32&#177;0.003m, and 0.31&#177;0.003m.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Initial Conditions</head><p>In the Full Information condition, the initial ball angle &#952; 0 clearly converged to preferred values with practice, while initial ball velocity decreased linearly &#952;0 (Fig. <ref type="figure">3A,</ref><ref type="figure">B</ref>). The exponential and linear fits are shown by solid black lines. The colored data denote the mean and one standard deviation for each trial number across 9 subjects. The average ball angle &#952; 0 decreased with a time constant &#964; =7.11 trials to an asymptote of -23deg. Average values in the first 5 trials dropped from -6.64 &#177; 13.27deg to -21.59 &#177; 7.17deg in the last 5 trials, (t(8)=2.8, p=0.02). Ball velocity &#952;0 declined linearly with a slope -0.29 deg/s&#8226;trial and an intercept -10deg/s; there was also a significant difference between the first 5 trials (-13.49 &#177; 28.64deg/s) to the last 5 trials (-44.57 &#177; 33.64deg/s); t(8)=2.75, p=0.02; Fig. <ref type="figure">3</ref>.</p><p>Subjects in the Hidden Dynamics group were not aware that they were moving a cup with a ball inside, as it was occluded by a solid rectangle (see Fig. <ref type="figure">1E</ref>). However, F ball acted on the hand via the robot handle. Subjects converged to a preferred &#952; 0 , despite being deprived of visual information about the ball. The average &#952; 0 values declined exponentially, with &#964; = 19.89; convergence to the final value of -20deg was slower in this group (Fig. <ref type="figure">3C</ref>). The initial ball angle averaged across the first 5 trials changed from -7.45 &#177; 10.59deg to -20.66 &#177; 9.66deg in the last 5 trials (t(8)=2.47, p = 0.04). The final &#952; 0 in the Full Information and Hidden Dynamics were not significantly different (t(16)=-0.28, p=0.79). For the Hidden Dynamics group, &#952;0 was highly variable and showed no significant trend across the experiment (Fig. <ref type="figure">3D</ref>). Values in the first 5 trials were 7.29 &#177; 20.51deg/s and 0.18 &#177; 42.66deg/s in the last 5 trials (t(8)=0.40, p=0.69). Initial ball velocity in the last 5 trials between Full Information and Hidden Dynamics were significantly different (t(16)=-2.53, p=0.02).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Transients</head><p>Following the convergence to preferred initial states, it was expected that transient durations would decrease with practice. The average transient durations across all subjects in the Full Information condition declined with a decay constant of &#964; = 26.06 trials to an asymptote of 2.76s, Fig. <ref type="figure">4A</ref>. In the first 5 trials the average duration was 12.04 &#177; 2.38s decreasing to 3.62 &#177; 1.43s by the last 5 trials (t(8)=9.57, p=1.73 &#8226; 10 -08 ). Subjects with Hidden Dynamics also shortened their transients with a similar time constant of &#964; = 21.14 trials to a final value of 5.49s (Fig. <ref type="figure">4B</ref>). The average duration in the first 5 trials, 11.24 &#177; 2.55s, decreased to 6.88 &#177; 3.67s by the last 5 trials (t(8)=3.07, p=0.006). However, transient durations with Hidden Dynamics were not shortened to the same degree as with Full Information. The values achieved in the last 5 trials were significantly different in the two conditions (t(16)=2.94, p=0.009).</p><p>To evaluate the effect of the initial conditions of cup and ball on the transients, we used the simple control model to simulate the cup-and-ball dynamics. The objective was to compare which of the two perceptual conditions were closer to achieving optimal preparation. Fig. <ref type="figure">4C</ref>,D both show the same heat map of simulated transient duration for a range of initial ball angles &#952; 0 and initial ball velocities &#952;0 . Yellow areas indicate initial ball states that produced the shortest transients in simulation. As illustrated, the range of initial ball states that produced transient durations &lt; 0.1s were between -26.97deg and -18.38deg (&#952; 0 ) and between -157.13deg/s and -65.46deg/s ( &#952;0 ). The center was at &#952; 0 =-22.67deg and &#952;0 =-111.3deg/s.</p><p>All experimental data from the Full Information and Hidden Dynamics conditions were overlaid onto the same simulated landscape (Fig. <ref type="figure">4C</ref>,D respectively). Subjects with visual and haptic feedback chose &#952; 0 , &#952;0 that produced transients close to the optimum in simulation. The cyan star shows the center of the data at &#952; 0 =-22.17deg and &#952;0 =-27.68deg/s. Subjects without visual feedback were further away from the optimal states. The center of the data, shown by the red star, was at &#952; 0 =-17.57deg and &#952;0 =-3.43deg. These results indicate that if only provided haptic information about the internal dynamics, subjects could find a mapping between initial states and simplified dynamic behavior. However, for convergence to the global solution, subjects also required visual information about the internal dynamics. To understand why the preferred initial states differed between the two groups, the preparatory activity was analyzed further.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Characterization of Preparatory Activity</head><p>One indicator of whether the preparatory activity changed with practice was the duration of the preparation interval (Fig. <ref type="figure">5A,</ref><ref type="figure">B</ref>). The duration in the Full Information condition was 17.69 &#177; 10.61s on average in the first 5 trials and decreased to 5.76 &#177; 5.04s in the last 5 trials (t(8)=3.34, p=0.008; Fig. <ref type="figure">5A</ref>). The time constant of an exponential function fit was &#964; =16.09. Subjects in the Hidden Dynamics condition required more trials to converge to a final value &#964; =48.41 (Fig. <ref type="figure">5B</ref>), but their preparation interval duration also decreased significantly from 23.42 &#177; 13.46s to 5.40 &#177; 1.80s (t(8)=3.88, p=0.004).</p><p>Cycle-by-cycle frequencies during the preparation interval, summarized as histograms in Fig. <ref type="figure">5C,</ref><ref type="figure">D</ref>, revealed that in both conditions, subjects initially explored a wide array of frequencies from 0.50-0.80Hz. Across trials, subjects narrowed this range to frequencies around 0.60Hz, coincident with the subsequent metronome frequency. It is notable that this distribution was slightly wider in Hidden Dynamics.</p><p>This small but visible difference in 'jiggling' frequencies motivated further analysis using system identification. Fig. <ref type="figure">6A</ref>,B illustrates timeseries of interaction forces and the resulting cup positions for all trials in the preparation interval for two example subjects. The subject in the Hidden Dynamics condition exhibited more variability in preparation frequencies than the subject in Full Information. To capture the specific dynamic behavior, system identification methods were applied to fit 3rd-, 4th-and 5th-order linear transfer functions to the data. Goodness of fit quantified by the MSE values are summarized in Fig. <ref type="figure">6C</ref>. The averaged MSE fits to training data across subjects from the first 30 and last 30 trials for the three transfer functions are shown. Regression analyses found a significant difference between the two experimental conditions (p &lt; 0.001), and between early and late practice (p &lt; 0.001). In Full Information, the decrease in MSE from early to late training was significant (t(16) &gt; 2.5, p &lt; 0.03, for all orders of linear fits), demonstrating that subjects learned to simplify or linearize their dynamic behavior in the preparation interval (Fig. <ref type="figure">6C</ref>). In contrast, in the Hidden Dynamics condition, the change in MSE values from early to late practice was not significant and indicated that subjects did not learn to produce more linearized input-output behavior (t(16) &lt; 1.49, p &gt; 0.17, for all orders of linear fits). Comparison between the MSE values in early stage of training between the two conditions showed no significant difference (t(16) &lt; 0.86, p &gt; 0.4, for all orders of linear fits). However, comparison of values in later trials showed that MSE values in Full Information were significantly lower than those in Hidden Dynamics (t(16) &gt; 2.1, p &lt; 0.05 for all orders of linear fits). Subjects in the Hidden Dynamics condition excited complex dynamic modes and could not linearize dynamics, and therefore could not simplify behavior.</p><p>IV. DISCUSSION This paper investigated the effect of visual and haptic information in humans when exploring and preparing a complex object for a transport task. Examples for the challenges that manipulation of complex objects pose abound, both humans and robotic systems, ranging from opening a box with unknown contents to carrying a cup filled with liquid without spilling <ref type="bibr">[33]</ref>, <ref type="bibr">[34]</ref>. This study compared interactive strategies to identify the effect of visual and haptic feedback about the internal dynamics. With full visual and haptic information, subjects successfully explored the system's dynamics to converge relatively quickly to optimal initial conditions for the task. Transients significantly decreased as a consequence and subjects reached a steady state faster. Without visual information, convergence to initial conditions was less optimal and transients did not decrease to the same degree. Analysis of preparation activity revealed that with unrestricted information, subjects achieved linear mappings between interaction forces and the resulting cup position.</p><p>Perhaps the most striking difference between the two conditions was their transient durations (Fig. <ref type="figure">4A,</ref><ref type="figure">B</ref>). Shorter transients and, hence, longer steady state behavior is desirable, as in steady state the system exhibits predictable dynamics. This is likely the result of different preparatory actions as the length of the preparation interval differed accordingly. Multi-modal sensory feedback not only aided the identification of object dynamics, but also facilitated learning of simpler control strategies. Subjects with unrestricted information were able to simplify preparatory behavior of the cup-and-ball by eliciting linear dynamics. This finding is in line with existing human movement studies, which showed that augmenting visual feedback with haptic information in a ball bouncing task enhanced subjects' learning of openloop stable control strategies <ref type="bibr">[35]</ref>, <ref type="bibr">[36]</ref>. Furthermore, we believe that the dual role of haptic feedback, in carrying both information and mechanical power, may have enhanced the perception and learning of interaction dynamics. Leveraging this feature of haptic feedback may be useful for robotic applications and warrants further investigation.</p><p>Subjects without visual information about the internal dynamics did not converge to a preferred initial ball velocity (Fig. <ref type="figure">3D</ref>). While not a singular contributor to performance, this absence of convergence to a specific value indicated that subjects were not able to estimate ball velocity solely using force feedback. This result showed that different sensors can observe different states. For a robotic task, when full state observability is crucial, a multi-modal sensory stream would be advantageous.</p><p>In the absence of vision, the decline in the duration of the preparation interval was also much slower (5A,B). This was accompanied by a similarly slow convergence of the range of frequencies employed in the preparation interval (5C,D) implying that without visual information, subjects demanded more interactions with the object to identify its properties and dynamics. With an eye to robotics, a multi-modal sensory stream could potentially facilitate more efficient object property estimation and learning of a manipulation skill. This could lead to more agile robots in manufacturing, military and healthcare settings.</p><p>In robotics the vast majority of approaches that model learning through object interaction have only used one mode of data: either visual or tactile feedback <ref type="bibr">[37]</ref>- <ref type="bibr">[43]</ref>. However, the results presented here highlight the importance of equipping a learning system with multiple sensory modalities. There is no one sensor, whether visual or haptic, that alone is adequate for learning the dynamics of an object. The integration of multi-modal data facilitates more efficient and robust interactive perception. From a robot control perspective, this has its own challenges given the heterogeneous nature of the data and their different dimensions, frequencies, and characteristics. Therefore, more investigation is warranted.</p><p>This study only investigated the visual and haptic sensory modalities. Yet, humans incorporate an even richer array of sensory modalities including proprioception, vestibular, auditory and olfactory feedback. In robotics, it would be useful to explore the value of adding a wider set of modalities to go beyond vision and tactile sensing, such as auditory and pressure sensors. For future work, we would like to expand our set of experimental conditions to investigate the effects of a wider range of sensory modalities and their integration.</p></div></body>
		</text>
</TEI>
