Abstract
Humans communicate with each other through language, which enables us talk about things beyond time and space. Do non-human animals learn to associate human speech with specific objects in everyday life? We examined whether cats matched familiar cats’ names and faces (Exp.1) and human family members’ names and faces (Exp.2). Cats were presented with a photo of the familiar cat’s face on a laptop monitor after hearing the same cat’s name or another cat’s name called by the subject cat’s owner (Exp.1) or an experimenter (Exp.2). Half of the trials were in a congruent condition where the name and face matched, and half were in an incongruent (mismatch) condition. Results of Exp.1 showed that household cats paid attention to the monitor for longer in the incongruent condition, suggesting an expectancy violation effect; however, café cats did not. In Exp.2, cats living in larger human families were found to look at the monitor for increasingly longer durations in the incongruent condition. Furthermore, this tendency was stronger among cats that had lived with their human family for a longer time, although we could not rule out an effect of age. This study provides evidence that cats link a companion’s name and corresponding face without explicit training.
Introduction
Many human words have referential meanings: they evoke a visual mental image when heard or read1. For example, the word “apple” causes us to imagine a red or green fruit even if no such fruit is present. This language property, which expands the plasticity of communication, is also seen to some extent in non-human animals, mainly in the context of intraspecific vocal communication. Seyfarth, Cheney and Marler reported that vervet monkeys (now called Chlorocebus pygerythrus) responded differently to different types of alarm calls2 (although some of the calls overlap acoustically3 and this view is currently debated4). More recently, west African green monkeys (Chlorocebus sabaeus) rapidly learned the novel referent of an alarm call that was given in response to a drone5. Referential signaling is not limited to primates. Suzuki showed that tits (Parus minor) detected snake-like motion more rapidly when a snake-specific alarm call rather than a general alarm call was played back, suggesting that tits recall things to which at least one specific call refers6. Such studies show that animals have specific calls with a referential meaning, increasing the likelihood of responses appropriate for survival.
In contrast to studies dealing with life-or-death-related issues and ecology, some studies have reported that companion animals understand human utterances in more neutral situations and use them in communication with us [e.g., dogs (Canis lupus familiaris)7,8,9,10,11,12]. Dogs in particular have been studied in this context; for example, a few “expert” dogs trained in object-name fetching over several months remembered hundreds of object names and fetched the correct object upon verbal command7,8,12. According to a recent report, “gifted” dogs learned object names after few exposures during social interactions, whereas the majority of dogs did not show such object-name association learning despite intensive training12.
Similar to dogs, cats (Felis catus) are one of the most widespread companion animals in the world13 . Although the ancestral Libyan wildcat (Felis lybica) is a solitary species14, many domestic cats live with humans and show evidence of social cognitive operations concerning humans. They can use human pointing cues15 and gaze cues16 to find food. They also discriminate between human facial expressions17,18,19 and attentional states20,21,22, and identify their owner’s voice23. Furthermore, they cross-modally match their owner’s voice and face24 when tested with their owner’s photo presented on a screen, and human emotional sounds and expressions19.
Cats have been shown to distinguish their own from another familiar cat’s name in a habituation–dishabituation procedure25, and they also distinguished those names from general nouns. Interestingly, cats living in multi-cat households habituated less to their companion cats’ names than to other nouns. Conceivably, therefore, cats might also recognize the name of another cat living in the same household.
Here we examined whether cats linked a human utterance and the corresponding object, using a relatively simple task that is applicable to many species: a visual-auditory expectancy violation task previously used to test cats’ ability to predict an object when hearing that objects’ name24. As stimuli we used the names of other cats (“models”) cohabiting with the subjects in Exp.1, and human family members’ names in Exp.2. Cats were presented with the face of the other cat (Exp.1) or human (Exp.2) following presentation of the model’s name, called by the owner (Exp.1) or an experimenter (Exp.2). Half of the trials were “congruent,” i.e., the model’s face and name matched, whereas the other half were “incongruent” (the stimuli mismatched). Previous research showed that cats matched human photos and voices24, which established the validity of presenting photos as stimuli. Our hypothesis was that cats learned face–name relationships by observing interactions involving their owner, and that more such observations would lead to stronger learning. We tested two groups of cats, differing in the number of other cats they lived with: cats belonging to cat cafés where many cats live together, and household cats. The latter probably have more opportunities to observe interactions between the owner and each of the other cohabitating cats, which might facilitate learning of the face–name relationship. Therefore, we analyzed data from household cats and cat café cats separately in Exp.1. In Exp.2, analysis concerned the number of cohabiting family members because more members would have more opportunities to hear other members’ names (e.g., people living as a couple probably say each other’s name less often than people living in a larger family). In Exp.2 we considered length of time living with the family as well as the number of family members.
We made two predictions. First, attention toward the stimulus face displayed on the monitor should be longer in incongruent trials due to expectancy violation. Second, the amount of violation is related to the amount of exposure to relevant interactions; specifically, household cats should show stronger violation effects than café cats in Exp.1, and cats living in households with more people should show more evidence of expectancy violation in Exp.2.
Experiment 1
Materials and methods
Subjects
We tested 48 cats (28 males and 19 females). Twenty-nine (17 males and 12 females, mean age 3.59 years, SD 2.71 years) lived in five “cat cafés” (mean number living together: 14.2, SD 10.01), where visitors can freely interact with the cats. The other 19 (11 males and 8 females, mean age 8.16 years, SD 5.16 years) were household cats (mean number living together: 6.37, SD 4.27). We tested household cats living with at least two other cats because the experiment required two cats as models. The model cats were quasi-randomly chosen from the cats living with the subject, on condition of a minimum period of 6 months cohabiting, and having different coat colors so that their faces might be more easily identified. We did not ask the owner to make any changes to water or feeding schedules.
Stimuli
For each subject, visual stimuli consisted of two photos of two cats other than the subject who lived together, and auditory stimuli consisting of the voice of the owner calling the cats’ names. We asked the owner to call each cat’s name as s/he would usually do, and recorded the call using a handheld digital audio recorder (SONY ICD-UX560F, Japan) in WAV format. The sampling rate was 44,100 Hz and the sampling resolution was 16-bit. The call lasted about 1 s, depending on the length of cat’s name (mean duration 1.04 s, SD 0.02). All sound files were adjusted to the same volume with the help of version 2.3.0 of Audacity(R) recording and editing software26. We took a digital, frontal face, neutral expression, color photo of each cat against a plain background (resolution range: x = 185 to 1039, y = 195 to 871) which was expanded or shrunk to fit the monitor size (12.3″ PixelSense™ built-in display).
Procedure
We tested cats individually in a familiar room. The cat was softly restrained by Experimenter 1, 30 cm in front of the laptop computer (SurfacePro6, Microsoft) which controlled the auditory and visual stimuli. Each cat was tested in one session consisting of two phases. First, in the name phase the model cat’s name was played back from the laptop’s built-in speaker four times, each separated by a 2.5-s inter-stimulus interval. During this phase, the monitor remained black. Immediately after the name phase, the face phase began, in which a cat’s face appeared on the monitor for 7 s. The face photos were ca. 16.5 × 16 cm on the monitor. Experimenter 1 gently restrained the cat, looking down at its head; she never looked at the monitor, and so was unaware of the test condition. When the cat was calm and oriented toward the monitor, Experimenter 1 started the name phase by pressing a key on the computer. She restrained the cat until the end of the name phase, and then released it. Some cats remained stationary, whereas others moved around and explored the photograph presented on the monitor. The trial ended after the 7-s face phase.
We conducted two congruent and two incongruent trials for each subject (Fig. 1), in pseudo-random order, with the restriction that the same vocalization was not repeated on consecutive trials. The inter-trial interval was at least 3 min. The subject’s behaviors were recorded on three cameras (two Gopros (HERO black 7) and SONY FDR-X3000): one beside the monitor for a lateral view, one in front of the cat to measure time looking at the monitor, and one recording the entire trial from behind.
Diagram illustrating each condition in Exp.1. Two model cats were chosen from cats living with subject. The model cat’s name called by owner was played through the speaker built into the laptop computer (Name phase). Immediately after playback, a cat’s face appeared on the monitor (Face phase). On half of the trials the name and face matched (congruent condition), on the other half they mismatched (incongruent condition).
Analysis
One cat completed only the first trial before escaping from the room and climbing out of reach. For the face phase we measured time attending to the monitor, defined as visual orientation toward or sniffing the monitor. Trials in which the subject paid no attention to the monitor in the face phase were excluded from the analyses. In total, 34 congruent trials and 33 incongruent trials for café cats, and 26 congruent trials and 27 incongruent trials for house cats were analyzed (69 trials excluded overall). A coder who was blind to the conditions counted the number of frames (30 frames/sec.) in which the cat attended to the monitor. To check inter-observer reliability, an assistant who was blind to the conditions coded a randomly chosen 20% of the videos. The correlation between the two coders was high and positive (Pearson’s r (=) 0.88, n (=) 24, p < 0.001).
We us
0.001).p>