Tagging of English and Cantonese data
Grammatical categoriesThe grammatical category labels for the English corpus are based on the MOR grammars for English in the CHILDES Windows Tools while those for the Cantonese corpus are based on those of Cancorp (Lee et. al 1996) with thirty-three categories distinguished, as shown in Table 1 (see MacWhinney 2000:364-365). These are as used in Cancorp apart from the following modifications: (i) the category 'particle' (prt) rather than 'clitic' is used for the postverbal modal dak1 and postverbal dou3 introducing an extent complement; (ii) the category 'localizer' (loc) is used for locative expressions such as dou6 as in zoeng1 toi2 dou6 '(lit.) the table there' as well as for expressions such as haa6bin6 'down there' which are tagged as locative noun phrases (nnloc) in Cancorp. (iii) the category 'onomatopoeic expression' (onoma) is introduced in our Cantonese corpus for sounds such as wo1wo1 'barking of dogs' and baang4 'crashing/shooting noise'. (iv) the category 'ditransitive verb' (vd) is applied only to verbs which allow two NP objects such as bei2 'give', excluding other three-place predicates such as baai2 'put'. |
Syntactic
categories
|
Example
|
||
1.
|
adj
|
adjective
|
sau3,
leng3,
faai3
hou2teng1
thin, pretty,
fast, good to listen to |
2.
|
advf
|
focus adverb
|
dou1,
sin1,
jau4,
zung6
also, first, again, still
|
3.
|
advi
|
adverb of
intensity
|
gam3,
hou2,
taai3,
zeoi3
so, very, too, most |
4.
|
advm
|
adverb of
manner
|
gwaai1gwaai1dei2,
maan6maan2
obediently, slowly |
5.
|
advs
|
sentential
adverb
|
jan1wai6,
so2ji5,
bat1jyu4
because, therefore, how about |
6.
|
asp
|
aspectual
marker
|
zo2,
gwo3,
gan2,
hoi1,
haa5
PFV, EXP, PROG, HAB, DEL |
7.
|
aux
|
auxiliary/modal
verb
|
jing1goi1,
wui5,
m4hou2
should, would, don't
|
8.
|
cl
|
classifer
|
bun2,
go3,
gaa3,
tiu4
CL |
9.
|
com
|
comparative
morpheme
|
di1
as in leng3 di1,
gwo3
as in leng3 gwo3 keoi5
more beautiful, prettier
than her |
10.
|
conj
|
connective
|
ding6hai6,
tung4maai4waak6ze2
or, and, or |
11.
|
corr
|
correlative
|
jat1lou6...
jat1lou6,
jyut6...jyut6
while, the more...the more
|
12.
|
det
|
determiner
|
li1,
go2,
dai6
this, that, number
|
13.
|
dir
|
directional
verb
|
lei4/lai4,
heoi3,
ceot1,
jap6,
soeng5,
lok6
come, go, out, in, go up,
go down |
14.
|
ex
|
expressive
utterance
|
ai1jaa3,
e3, m4goi1
oops, well, please/thanks |
15.
|
gen
|
genitive
marker
|
ge3,
as in Timmy ge3 pang4jau5 Timmy
Timmy's friends |
16.
|
ins
|
emphatic
inserted marker
|
gwai2
as in gam3 gwai2 lyun6
what a mess!
|
17.
|
loc
|
localizer
|
dou6as
in zoeng1 toi2 dou6 ,
soeng6min6
on the table, up there
|
18.
|
nn
|
noun
|
ce1,
wun6geoi6,
sing1sing1
, kau3fu2
car, toy, star, uncle
|
19.
|
nnpr
|
pronoun
|
ngo5,
lei5,
keoi5,
ngo5dei6,
lei5dei6 ,
keoi5dei6
I/me,
you, s/he, we/us, you(pl), they/them
|
20.
|
nnpp
|
proper noun
|
ciu1jan4 ,
je4sou1,
jing1gwok3
Superman, Jesus, Britain
|
21.
|
neg
|
negative
morphem
|
m4
, mai6
, mou5
not, not, not have
|
22.
|
onoma
|
onomatopoeic
expression
|
wou1wou1,
baang4,
gok6gok6
ONOMA |
23.
|
prt
|
(postverbal)particle
|
dak1,
dou3,
saai3,
maai4,
jyun4
can, until, all, as well,
finish |
24.
|
prep
|
preposition
|
hai2,
bei2
at, for |
25.
|
q
|
quantifier
|
jat1,
sap6saam1,
mui5
one, thirteen, each
|
26.
|
rfl
|
reflexive
pronoun
|
zi6gei2
self |
27.
|
sfp
|
sentence-final
particle
|
aa3,
laa1,
gaa3,
ho2
SFP |
28.
|
vd
|
ditransitive
verb
|
bei2,
sung3
give, give (as a gift)
|
29.
|
verg
|
ergative(unaccusative)
verb
|
dit3,
tyun5
fall, break |
30.
|
vf
|
function
verb
|
hai6,
jau5
be, have |
31.
|
vi
|
intransitive
verb
|
siu3,
jau1sik1,
kei4tou2
smile, rest, pray
|
32.
|
vt
|
transitive
verb
|
sik6,
gong2,
zi1dou3
eat, say, know
|
33.
|
wh
|
wh phrases
|
bin1go3, mat1je5(me1), bin1dou6, dim2gaai2 who, what, where, why |
Morpheme tier %morThe %mor tier was generated using a tagging program developed by Lawrence Cheung. Since Cantonese has many homophonous morphemes, it was necessary to carry out disambiguation with respect to word class. The disambiguation and checking were performed by Gene Chu and Simon Huang for both Cantonese and English files. Cantonese Tier %canThe child's Cantonese was first transcribed using romanized Cantonese instead of Chinese characters. The %can tier was generated at a later stage to provide readers who can read Chinese characters with quicker access to the speakers' utterances. Fonts for Cantonese characters are available at the Hong Kong SAR government website, http://www.5c.org/ as well as through Microsoft. The same characters are used for allophonic representations of a morpheme. Due to ongoing sound changes, there is variation especially between n/l and ng/initials (Matthews and Yip 1994: 29-30). For example, the first person pronoun is represented as ngo5 in the corpus but is often pronounced o5. The second person pronoun is represented as lei5 although the prescribed form is nei5. For the demonstrative there are several variant forms: li1/ni1/ji1/nei1/lei1 'this'. The experiential aspect marker may appear as gwo3 or go3. Other alternative forms result from contraction, for example mat1je5 'what' becomes me1 and hou2 m4 hou2 'is it okay?' becomes hou2 mou2. |