Before the conclusive demonstration that unwritten languages could be classified genetically, they were often relegated to a typological classification, which at one time was denigrated by scholars. Since 1917, however, the prestige of some kinds of typology has risen--in particular, that of grammatical typology. The best-known typological frame of reference represents the grammar of a language, either as a whole or as a subsystem. Once a genetic classification has been established, typological classification may be superimposed on it in order to show change of language type--as from a predominantly inflectional language (such as Proto-Germanic) to a predominantly isolating one (such as modern English)--or to show features that are shared by languages in neighbouring branches in the same family (e.g., Celtic and Germanic in Indo-European). The ultimate grammatical typology is that which treats subsystems that are, in some sense, universal to all human languages.
Lexical typologies, based on similarities in vocabulary structure, have been used in cognitive anthropology and psycholinguistics (e.g., perception of colours and use of colour terms). The sociolinguistic frame of reference in typology provides classifications for varieties of language in terms of their functions and their ways of identifying social groups and cultural spaces; in addition, it brings order and integration to problems concerning national standards that are faced by new nations that have many nonstandard and unwritten languages as well as languages that make use of writing.
A few points of terminology should be explained before further discussion of the world's languages is afforded. Language family is the label often used for a conservative genetic classification, one that can be attested only when an abundance of cognates (related words) is available. Phylum is the label for a liberal genetic classification that is attested with fewer cognates; it encompasses language families. Although a given phylum will have greater extension than any of the families included in it, only fragments of phonology will be reconstructible in the protolanguage. In actual linguistic usage, however, the term family is often employed to refer to a group that is technically a phylum--e.g., the Afro-Asiatic (Hamito-Semitic) family, the Sino-Tibetan family.
The label language isolate is used for a language that is the only representative of a language family, as Basque or the extinct Sumerian language; the presumptive but unknown sister languages of isolates are dead and unrecorded. A language isolate may be classified, along with normal language families, under the rubric of an extensive phylum (e.g., Korean is sometimes classified as a member of a hypothetical Ural-Altaic phylum) or left wholly unclassified (e.g., the Ainu language of Japan). The label pidgin-creole is used for a language that has had so much vocabulary change that cognates for reconstructing the protolanguage from which it descended cannot be found. A pidgin is a contact language used for communication between groups having different native languages. When a pidgin becomes the native language of a community it is customarily called a creole.
This article begins with a survey of world languages based on geographic regions of unequal size: huge and sprawling areas for the peripheral regions of Africa, Oceania, and the Americas but relatively compact areas for the focal regions within Eurasia. Nine regions--six in Eurasia, in all of which writing and standard languages are widespread--constitute a convenient basis for comparison and contrast. The larger part of the article consists of more detailed examinations of the languages of the world arranged by genetic affinities.
Facets of the subject of language and human communication are treated in a variety of articles. For a full account of the theory and methods of linguistic science, see the article LINGUISTICS. For information on such subjects as the characteristics of language, language variants (slang, jargon), speech production, and the acquisition of language, see the article LANGUAGE. For a full account of phonetics and the pathology of speech, see the article SPEECH. For information on written languages and writing systems, see the article WRITING.
For coverage of related topics, see SPECTRUM, section 514, and the Index.
An unusually small degree of genetic diversity is found among European languages: there are fewer language families in Europe than in any other continent-size region of the world. In addition, literary traditions that have resulted in the preservation of earlier forms of present-day languages are found to a high degree among these languages. Every European language with a writing tradition has developed at least one standard that is recognized nationally, and the national standard often coexists with recognized regional standards.
A few European languages are used internationally, as lingua francas, but there is a low degree of pidgin and creole usage in Europe today.
Typological classifications have been superimposed on genetic classifications of European languages in particular. For example, the Italic branch of Indo-European languages may be grouped with the Greek, Celtic, and Germanic branches on the basis of certain structural features, as can Armenian with Greek and the Indo-Iranian languages, and so forth.
Similar difficulties in counting separate languages exist for all the branches in which several languages are spoken. In terms of areas of high mutual intelligibility (which do not entirely reflect historical development), there are only five modern Germanic languages: English, Frisian, Netherlandic-German (including Afrikaans and Yiddish), Insular Scandinavian, and Continental Scandinavian. If literary tradition and national criteria are considered, the number is increased by the division of Netherlandic-German into Standard High German, Low German, Dutch-Flemish, Afrikaans, Luxemburgian, and Yiddish; the division of Insular Scandinavian into Icelandic and Faroese (Faeroese); and the division of Continental Scandinavian into Norwegian (which is further subdivided into New Norwegian [or Nynorsk] and Dano-Norwegian [or Bokmål]), Danish, and Swedish.
For the Slavic languages there are 13 literary standards, but between the nuclei formed by these norms there are scarcely any linguistic boundaries, because transitional dialects connect adjacent areas. In terms of intelligibility and, to some extent, in terms of shared features, the Slavic literary norms can be grouped into three zones: East Slavic (including Russian, Belarusian, and Ukrainian), West Slavic (including Polish, Kashubian, Lower, or Low Sorbian, Upper, or High Sorbian, Czech, and Slovak), and South Slavic (including Slovene, Serbo-Croatian, Macedonian, and Bulgarian).
Language boundaries are more clear-cut for the modern living languages in the remaining European branches of Indo-European: Celtic (the physical separation of the speakers of the languages contributes to the separate identification of Welsh, Breton, Irish, and Scottish Gaelic), Baltic (the literary and political separation coincide with the separation of Lithuanian and Latvian), Greek (the separate historical development and lack of mutual intelligibility separate Modern Greek from Tsakonian), and Albanian (the political unity of Albania contributes to the single-language identification of Gheg and Tosk, its two divergent dialects). (see also Index: Celtic languages, Baltic languages, Greek language, Albanian language)
Dialects of two languages in the Indo-Iranian branch of Indo-European also are or were spoken in Europe: the Jassic dialect of Ossetic, an Iranian language, formerly spoken in Hungary; and the European dialects of Romany, which was spread by Gypsies throughout Europe and into America. It may be, however, that only in Wales, Finland, and the Balkans does Romany still serve as a native language, though Welsh and Finnish Romany have few speakers left. (see also Index: Ossetic language, Romany language)
A number of earlier Indo-European languages that died out without descendants are known from written records and comments by contemporaries; these include several in the Italic branch (such as Oscan, Umbrian, Faliscan, Venetic), at least four in the East Germanic group (including Burgundian, Ostrogothic, Visigothic, and Vandalic), and three in the Celtic branch (Gaulish, Cornish, and the recently extinct Manx). The classification as Indo-European or non-Indo-European for many other extinct languages (such as Pictish, spoken in what is now northern Scotland) remains uncertain because of the scarcity of data.
For more information on the Indo-European languages of Europe, see Greek language; Italic languages; Romance languages; Germanic languages; English language; Celtic languages; Baltic languages; Slavic languages; Albanian language.
The Samoyedic branch of Uralic is represented by the Nenets, who live in northernmost Russia from the mouth of the Northern Dvina River eastward into North Asia. The remaining Uralic language of Europe, which belongs to the Ugric subgroup, is Hungarian. See Uralic languages. (see also Index: Nenets language)
Ethnolinguistic loyalties also may increase the number of languages distinguished--e.g., the separate recognition of Bengali and Assamese. Most of the Indo-Aryan languages are spoken by many millions of speakers--e.g., Bengali, Assamese, Hindi, Marathi, Maithili, Maghi, Bhojpuri, Gujarati, Oriya, Sinhalese (in Sri Lanka), Sindhi, and Nepali. There are also large numbers of speakers of Indo-Aryan languages (especially West Hindi) in South Africa, in various parts of the South Pacific, and in South America (particularly Suriname and Guyana).
The languages of the Dardic subgroup differ sufficiently from the other Indo-Aryan languages as to be sometimes classified as Iranian rather than Indo-Aryan or as a separate subbranch coordinate with the Indo-Aryan and Iranian subbranches. Kashmiri, spoken in Jammu and Kashmir, is the only Dardic language with a literary tradition. Shina also is spoken in Jammu and Kashmir; other Dardic languages, which are spoken mostly in Pakistan, have relatively few speakers. (see also Index: Kashmiri language, Shina language)
Some speakers of at least four Iranian languages also are found in South Asia, including Pashto speakers in Pakistan and Balochi (Baluchi) speakers in both Pakistan and India (see Indo-Iranian languages).
Whether or not Altaic and Uralic are related to each other, there is no doubt that languages of both stocks share many typological features, such as vowel harmony, agglutination (a type of word formation in which word elements are added together but still retain a separate, definite meaning), and a restriction against combining a plural noun with a quantifier (as though, in English, the plural noun "girls" had to appear as a singular noun in the phrase "five girl," rather than "five girls").
In North Asia, Turkic languages are distributed from the southern extension of North Asia northeastward through central Siberia and include Turkmen in Turkmenistan, Iran, and Afghanistan; Uzbek in the Central Asian countries, primarily Uzbekistan, and in Afghanistan; Kyrgyz in Kyrgyzstan and in neighbouring areas from Afghanistan to China; Karakalpak in Karakalpakstan republic in Uzbekistan; and Kazak in Kazakstan. Six or seven Turkic groups immediately north of western Mongolia are much smaller in terms of both numbers of speakers and the region they inhabit; in northern Siberia the Yakut extend from Sakha republic (Yakutia) west to Taymyr autonomous okrug (district), where the Dolgan people speak a Yakut dialect.
Most speakers of Manchu-Tungus languages are bilingual in the official language of their country, and many are replacing their native languages with Russian or Chinese. After the Manchu in China, the next best known and numerous of the Manchu-Tungus peoples are the Evenk. Other groups include the Even (or Lamut), Nanai (or Gold), and other relatively small tribes. For more information on the Turkic, Mongolian, and Manchu-Tungus languages, see Altaic languages.
The northernmost and most widespread of these linguistic groups and the only one that includes more than one living language is the Luorawetlan family, which consists of Chukchi, Itelmen (Kamchadal), and Koryak. Most scholars now classify Kerek and Aliutor, once considered to be dialects of Koryak, as independent languages. (see also Index: Luorawetlan languages)
Nivkh (Gilyak), spoken on Sakhalin Island and in the coastal and inland Amur River country of the mainland, has no known linguistic relatives. Ket (or Yenisey-Ostyak) is the only language of the Yeniseian or Yenisey-Ostyak family that is still spoken. Ket speakers live along the upper and middle Yenisey River, as did the speakers of its sister languages, Kott (Cottian-Manu), which became extinct in the 19th century, and Assan (Asan) and Arin, both of which became extinct in the 18th century (see Paleo-Siberian languages). (see also Index: Nivkh language, Ket language, Arin language)
At one time Sumerian, which is preserved in written form, was spoken as the first language of civilization in the ancient Middle East; this language was neither Semitic nor Indo-European (see also Sumerian language). Early literary traditions and literacy for the elite began in this central area of Southwest Asia and extended from the Sumerian, Old Persian, and Akkadian literatures to Asia Minor (Hittite) in the north and to the Nile River (Egyptian) in northeastern Africa. Akkadian and Persian seem to have been the first two languages put to wide international use. (see also Index: Sumerian language, Akkadian language, Persian language)
The half a dozen Nuristani languages spoken in Afghanistan and part of Pakistan, sometimes classified as members of the Dardic subgroup of Indo-Aryan, more recently have been classified by some scholars as constituting a separate branch of Indo-Iranian. In addition, some Landa (Indo-Aryan) speakers also live in Afghanistan. Two very divergent dialects of another Indo-Aryan language, Romany, are spoken in Southwest Asia--Armenian Romany and Asiatic Romany (the dialect of the Palestinian Gypsies). For more information on the Iranian and Indo-Aryan languages, see Indo-Iranian languages. (see also Index: Dardic languages)
The long-extinct languages of the Anatolian branch of Indo-European, including Hittite, were once spoken in Southwest Asia (see Anatolian languages).
Five Turkic languages are spoken primarily in Southwest Asia: Turkish, spoken in Turkey and surrounding countries largely to the north; Azerbaijani, spoken chiefly in Azerbaijan and Iran; and Kumyk, Karachay, and Nogay, spoken in the Caucasus. Three Turkic languages spoken predominantly in North Asia also are spoken in Central Asia: Uzbek, Turkmen, and Kyrgyz.
One language of the Mongolian family is spoken in Southwest Asia-- Mogol in Afghanistan; and Brahui, a Dravidian language, has a small fraction of its speakers in Afghanistan and Iran. (see also Index: Mogol language, Brahui language)
Three general types of syntax, which partly overlap the liberal genetic classification, can be distinguished among languages in East Asia. First, Ainu is isolated syntactically as well as genetically. The second type is shared by Korean and Japanese. All Chinese languages are strikingly alike in syntax, and this third type is approximated among some non-Chinese languages of the Sino-Tibetan family and among some languages of Southeast Asia whose genetic classification is tentatively indeterminate.
Striking similarities in syntax have led some linguists to postulate a remote relationship between the Altaic languages and Korean and, less frequently, Japanese.
The Japanese language family includes, besides Japanese, several mutually unintelligible dialects spoken on the Ryukyu Islands by people who are bilingual in mainland Japanese. Japanese is spoken by some 125 million people in Japan and by small groups in Brazil and the United States, especially in Hawaii (see Japanese language).
Ainu, the remaining language in insular East Asia for which not even a remote relationship with other languages seems likely, originally was spoken in Japan and on Sakhalin Island and the Kuril Islands. By the late 20th century it was virtually extinct, with only a few speakers in northern Japan. (see also Index: Ainu language)
Although speakers of two different Chinese languages may not be able to understand one another when they talk, communication between them is possible in writing; conversely, the same written message is read aloud differently by speakers of different Chinese languages. The functional advantages of Chinese writing explains its perseverance for four millennia, but these advantages are partly offset by the difficulties each generation must experience in learning the thousands of character signs that are needed for literacy. Traditionally most Chinese citizens were believed to be illiterate, but, with simplified characters and romanization, the majority of the people in China are now literate. The Chinese languages are notable for their enormous numbers of speakers, and Mandarin has the largest number of speakers of any of the world's languages (some 800 million native speakers).
A remote relationship in one family (Sino-Tibetan) has been postulated for the Chinese languages and all the other non-Altaic families that have languages spoken in China. In spite of the fact that there is no doubt that all these languages bear many similarities to Chinese, current knowledge fails to reveal to what extent such similarities might be the result of borrowing rather than common origin.
More distantly related Tibeto-Burman languages are spoken in East Asia over the borders of Myanmar (Burma); these languages, often called Burmic, include dialects of the Burmese-Lolo subgroup (including Burmese) and the Kachin subgroup. For more information on the Chinese, Tibetan, and Burmic languages, see Sino-Tibetan languages.
Curiously enough, it is in Melanesia, between the Bismarck Archipelago and Vanuatu, that the most diverse Austronesian languages are spoken today; this provides grounds for the conjecture that the Proto-Austronesian language was spoken there millennia ago and that the daughter languages diversified as their speakers migrated throughout much of the world, with Malay and Cham backtracking eventually to mainland Southeast Asia, out of which the ancestors of Proto-Austronesian speakers must have come. (see also Index: Malay language, Cham language)
In general, the name of the country and the name of the national language are the same in both insular and mainland Southeast Asia. Thus, Pilipino (based on Tagalog) is the name of one of the national languages of the Philippines, even though Pilipino is learned as a second language by most Filipinos. The fear in all of Southeast Asia of indirect neocolonial domination motivates continued distrust of the old languages of colonialism--English, French, Dutch, Spanish--and now also of Japanese and Russian. A pidgin-creole-- Neo-Melanesian, or Melanesian Pidgin English--is used as a lingua franca by speakers of Austronesian and other languages from southern Papua through Melanesia into Micronesia. (see also Index: Pilipino language, neocolonialism)
Though the languages in the mainland subregion of Southeast Asia are genetically diverse, they show widespread ranges of the same typological features--such as the use of distinctive tones and classifiers--among unrelated or only remotely related languages.
The language of mainland Southeast Asia with the greatest number of speakers is Vietnamese, spoken in Vietnam and by smaller numbers of speakers in Cambodia, Thailand, and Laos. Muong, spoken in the central highlands of northern Vietnam, is recognized as a separate, but related, language and shows far less Chinese influence. (see also Index: Vietnamese language, Muong language)
Classified as a northern group of the Mon-Khmer family are several languages spoken in Myanmar (east of Mandalay), northwestern Thailand, northern Laos, and to a lesser extent in northern Vietnam and in southwestern China. These include languages of the Palaungic, or Palaung-Wa, branch, spoken in Myanmar, Thailand, China, and Laos; and the Khmuic branch, spoken in Laos, Thailand, and Vietnam. (see also Index: Palaungic languages)
Another branch of Mon-Khmer, the Aslian branch, is composed of three small groups of related languages in Malaysia. They are the North Aslian, or Semang, subbranch, spoken in the inland area of northern and central Malaysia and across the border in Thailand; the Senoic, or Sakai, subbranch, with speakers south of Kuala Lumpur on the coast and inland farther south; and the Semelaic, or South Aslian, subbranch, spoken south of the Senoic languages. Data on the Nicobarese languages, spoken on the Nicobar Islands, suggest that they form a distinct branch (Nicobarese) of the Mon-Khmer family (see Austroasiatic languages). (see also Index: Semelaic languages)
Many millions of Chinese are distributed throughout Southeast Asia; of these, more than 7 million are in Thailand, 1.7 million in Malaysia, 1 million in Vietnam, and smaller numbers in Myanmar, Cambodia, and Laos.
Of the other language groups in the Sino-Tibetan family in Southeast Asia, the Burmese-Lolo (Burmish) group has the widest distribution and the greatest number of speakers. Burmese is spoken as a second language by perhaps 90 percent of those in Myanmar who have another first or native language. The Lolo languages are spoken in Myanmar, Thailand, Laos, and Vietnam; they include Lisu, Lahu, Akha, Mung, Punoi, Pyen, and others, a few of which extend into northeastern India. Karen languages are spoken in Myanmar and Thailand and include Sgaw, Pho, Pa-o (or Taungthu), and Palaychi. Most of the languages of the Kuki-Chin (Kukish) group are spoken in Myanmar. Kachin languages also are spoken in Myanmar (see Sino-Tibetan languages and Tai languages). (see also Index: Burmese language)
The languages of Polynesia, including Maori in New Zealand, Tongan, Tahitian, and Hawaiian, form a subgroup that is part of a larger Eastern Oceanic subgroup of more than 100 languages, which includes besides the Polynesian languages such languages as Fijian and a number of languages of the Solomon Islands. At least seven of the languages of Micronesia (including Gilbertese, Chuukese, and Pohnpeian) form another subgroup.
More than 100 Austronesian languages are spoken throughout New Guinea, and more than 100 Austronesian languages, not counted as Eastern Oceanic, are spoken on smaller islands of Melanesia. Those few with as many as 10,000 speakers are all used as lingua francas in wider areas than those of their native speakers (Dobu in the D'Entrecasteaux Islands, Banoni in southwestern Bougainville, Panayati in the Louisiade Archipelago). Among the Austronesian languages still spoken on Taiwan are Ami, Atayalic, Paiwan, and Bunan. There is some scholarly disagreement concerning the classification of the Austronesian languages (see Austronesian languages).
An exceptionally liberal genetic classification claims that the many non-Austronesian languages in Melanesia and the few in Indonesia all belong to one phylum. Conservative classifications recognize several or even many different language families and avoid the older name for them (Papuan), because it might suggest either that the unrelated families of non-Austronesian languages are branches of one Papuan family or else that non-Austronesian languages are found only on the island of New Guinea. On the other hand, no classification is challenged when it is said that all Australian languages are ultimately related and additionally that they are related neither to Austronesian nor to non-Austronesian languages outside Australia. (see also Index: Papuan languages)
In Melanesia, which essentially constitutes the non-Austronesian world beyond Indonesia, there is much contact between Austronesian and non-Austronesian languages. Many of the Melanesian societies are multilingual, especially those in New Guinea; in addition to their native language, speakers often learn a few secondary languages--those of their immediate neighbours or, most frequently, Neo-Melanesian (a pidgin-creole with an English-based lexicon) or both.
In part of Papua New Guinea, Police (or Hiri) Motu, a pidgin based on an Austronesian language, is used as a lingua franca far beyond the territory of the few thousand native speakers of Motu. In Australia the same interest in mastering a multiplicity of languages is widespread, and Aborigines have developed another English-based pidgin-creole, quite different from Neo-Melanesian. Another parallel between Australian languages and the non-Austronesian languages north of Torres Strait is the disinclination of both to recognize or develop any one dialect of a language as a standard. (see also Index: Police Motu language)
There remain a number of families and isolated languages that seem not to be related to other Papuan languages. A liberal classification presented by the American linguist Joseph Greenberg in 1971, however, treats all the Papuan languages as genetically related in an Indo-Pacific phylum, which also includes Andamanese. Most Papuan languages are spoken by only a few hundred to a few thousand speakers (see also Papuan languages).
In grammatical typology the non-Austronesian languages north of Torres Strait are heterogeneous, while the Australian languages are syntactically homogeneous and almost identical in patterns of sound combinations. Both Australian languages and non-Austronesian languages have dialects that are linked in a chain such that speakers at either end do not understand the vocabulary of speakers at the other end, although speakers of adjacent dialects can understand each other.
The available data on the two or more languages that were spoken on Tasmania until the later part of the 19th century show a typical Australian sound system, but they have not been linked convincingly to the Australian languages.
Languages from Southwest Asia preceded the languages of European colonization: migrations of peoples to North Africa brought the Ethiopians almost three millennia ago and the Arabic speakers many centuries ago. The Phoenician circumnavigation of Africa in ancient times left traces--Phoenician coins--on the coasts but none in the interior, and long ago migrants from Indonesia reached Madagascar, 250 miles off the African coast. Before and during the colonial period, Arab and Indian traders reached East Africa, where today a few Indo-Aryan languages are spoken among Asians.
The interior of Africa was not known to any non-Africans before the colonial period, but its prehistory can now be partially reconstructed. For example, there is evidence that the homeland of the protolanguage of the numerous Bantu languages was in Cameroon or an adjacent area in West Africa (or in both areas); that a prehistoric migration brought the Bantu speakers to Central and East Africa; and that the movements of these Bantu speakers forced the speakers of San and Khoisan languages to leave their homeland around Lake Victoria and move south to the Kalahari.
In all the postcolonial nations today, either English or Arabic or French serves both as an international language and as a functioning national language. The question still unresolved for many African nations concerns which of their indigenous languages to develop through writing and to standardize as the official language or languages of education and of the political state. The numerous pidgin-creoles, as Krio, are recent and colonial in inspiration; Sango in the Central African Republic is surely indigenous but not so surely a pidgin-creole. Most of the dozen or so languages used in trade, such as Swahili in East Africa and Hausa in West Africa, tend to have great changes in vocabulary like pidgin-creoles, but they are not classified as pidgin-creoles; instead they are varieties of normal languages that function as lingua francas. Lingua francas of one sort or another are a prerequisite for the markets found throughout rural Africa. (see also Index: Swahili language, Hausa language)
Despite the genetic diversity of the languages of South Africa and the even greater diversity in West Africa, a part of each of these subregions can be shown, on the basis of typology, to be a linguistic area. Thus, most linguists have found that most languages in West Africa distinguish vocabulary items and word elements by tone; in South Africa the clicks characteristic of Khoisan languages also are found among neighbouring Bantu languages such as Xhosa and Zulu. The early use of typology to anticipate genetic classification, however, led to the claim that Africa was full of mixed languages--e.g., Mbugu in Tanzania. But Mbugu, despite having borrowed Bantu prefixes and culture words from Bantu, can be shown to have a single line of origin--to have descended from a single protolanguage (Proto-Cushitic)--on the basis of its grammatical constituents (in particular its pronouns and verb forms) and basic vocabulary items that are cognate with other Cushitic languages. (see also Index: Mbugu language)
Five Semitic languages are spoken in Africa, if modern colloquial Arabic is counted as a single language throughout its range across North Africa and the Arabian Peninsula and if Gurage in Ethiopia also is counted as a single language. The Semitic languages in Ethiopia include Amharic, Tigrinya, and Gurage (but the people grouped as Gurage may be speaking several separate languages). Tigré and Tigrinya are spoken in Eritrea.
Cushitic languages are spoken in Eritrea, Ethiopia, Somalia, The Sudan, Tanzania, and Kenya. The languages with the greatest number of speakers are Gallinya, Somali, Sidamo, Hadya, and Afar-Saho. Some scholars consider a group of languages traditionally classified as Cushitic to be a separate branch of Hamito-Semitic, called Omotic. Spoken in Ethiopia, they include Walamo, with far more speakers than the other Omotic languages, Ari, Shako, Zaysse, and others with only a few thousand or a few hundred speakers.
The languages of the Berber branch are spoken from the western desert of Egypt west to the Atlantic and extend to Senegal on the coast and to northern Nigeria in the interior. Guanche, an extinct language that may have been an offshoot of Berber, was formerly spoken on the Canary Islands. Berber languages include Shluh, spoken in Morocco; Tamashek (Tuareg) in Algeria, Libya, Niger, and Mali; and Tamazight in Morocco and Algeria (see below Hamito-Semitic languages).
Among the major Eastern Sudanic languages are Teso in Uganda and Kenya, Dinka in The Sudan, Luo in Kenya and Tanzania, and Lango in Uganda. Only three of the 30 or so languages of the Central Sudanic subgroup of Chari-Nile are spoken by groups of some 100,000 people: Sara in Central African Republic and Chad, Lugbara in Uganda and Zaire, and Mangbetu in Zaire.
Among the Nilo-Saharan languages that are not classified as Chari-Nile is the Saharan group. Kanuri, its largest member, is spoken by several million people in Nigeria, Niger, Cameroon, and Chad. In the Maba group, Masalit is spoken in The Sudan. Songhai, often classified as a language isolate, is spoken by about a million people in Niger, Mali, and Burkina Faso. Fur, also sometimes considered to be an isolate, is spoken mostly in The Sudan. (see also Index: Saharan languages, Maba languages, Songhai language)
Other subgroups in the Niger-Congo family include only a few dozen languages, as those in the Mande subgroup in West Africa, which are spoken from Mauritania to Ghana (including Bambara, Mende, and Vai). The Gur (Voltaic) languages, spoken from Mali and Côte d'Ivoire to Nigeria, include Mossi, with some 4,000,000 speakers, and numerous other languages with significantly fewer speakers. The West Atlantic languages, spoken from Senegal to Nigeria, include Fulani, Wolof, Temne, and several other languages of less numerical import. Of the languages of the Adamawa-Eastern subgroup, spoken from The Sudan to Cameroon, only Sango, through its use as a lingua franca, may be known by more than 1,000,000 people. The Kwa subgroup of Niger-Congo includes Twi (Akan), Yoruba (in Nigeria and Benin), and Igbo (also known as Ibo; in Nigeria). Some scholars link the Kordofanian languages of North and South Kurdufan provinces in The Sudan with the Niger-Congo languages in a Niger-Kordofanian phylum. (see also Index: Mande languages, Voltaic languages, Kwa languages)
Today there are six European languages in the Americas that serve as languages of both education and government administration. (Several Indian languages, however, function in this dual role--Guaraní of Paraguay, Greenlandic of Greenland, and Quechua and Aymara of Peru.) These official languages and their number of primary political divisions are Spanish (18); Portuguese (1); Dutch (2)--1 in Latin America and 1 in the Caribbean; English (2 in North America and 11 in the Caribbean); French (1 in North America and 3 in the Caribbean); and Danish (1 in Greenland). Before the colonial period in Latin America and during the first century or two of that period, the following American Indian languages could also be classed as official or semiofficial: Nahuatl (Nahua), the language of the Aztec in Mexico and Central America; Chibcha-Muisca in Colombia; Quechua, the language of the Inca, in the Andean area; Tupí in Brazil; and Guaraní in and around Paraguay. In addition to American Indian languages, two pidgin-creole languages are official in their own political divisions, Sranan (Taki-Taki) in Suriname and Papiamento in Curaçao. Other pidgin-creoles in the Caribbean, such as Haitian Creole, are being increasingly written.
Genetic diversity among languages of continental-sized areas can be expressed in terms of the number of minimum genetic classes taken as the usual basis for discussion by specialists of that area. Research may lead to a downward (or upward) revision, and a new number of minimum genetic classes is used as a basis for further discussion. For North America (north of Mexico) and for the 20th century, the basis for discussion has shifted three times so far: from about 50 families in the classification of the U.S. scholar J.W. Powell to six phyla in the classification of the U.S. anthropological linguist Edward Sapir, which was revised at the 1964 Conference on North American Indian Languages by splitting and reclassification (e.g., of Sapir's Hokan-Siouan) and by merging (e.g., the Muskogean family and a few isolates were added to Algonquian [Algonkian] in the Macro-Algonquian phylum). This third classification is summarized below. Proposals for a minimum number of genetic classes in South America range from more than 100 families to three phyla (in a recent liberal classification).
The Plains Indian sign language (hand talking) is still known, but Chinook Jargon and other pidgin-creoles in North America fell into disuse as soon as American Indians became bilingual in English, French, or Spanish.
Language families in the East give an impression of a little typological similarity combined with considerable genetic diversity. On the opposite coast, California is surprisingly homogeneous in culture and in language typology but heterogeneous in genetic classification of languages. There are few languages and only two language families represented in the Great Basin, which is homogeneous in all respects. The adjacent Southwest is anomalous in all three variables considered here. Where it is culturally homogeneous, as between Pueblo societies, it is genetically and typologically diverse in language: four different language families are represented in Pueblo societies. Non-Pueblo societies of the Southwest are diverse culturally as well as linguistically. (see also Index: Pueblo Indians)
The Andean-Equatorial phylum includes the greatest number of non-extinct languages (almost 200) and the three South American Indian languages with the greatest number of speakers (Quechua, Guaraní, and Aymara). The living Andean-Equatorial languages constitute some 14 families and several language isolates. The Arawakan family includes the largest number of languages--some 100--and has the widest distribution: across northern South America from French Guiana to Colombia and southward as far as Paraguay; formerly, Arawakan languages also were spoken in Central America and the islands of the Caribbean. Most Arawakan languages are spoken by not more than a few hundred people. More than two dozen languages of the Tupian family are still spoken over a large part of South America, principally south of the Amazon River. Tupian languages include Guaraní (Tupí-Guaraní), which is spoken in a number of dialects by about 4,000,000 people in Paraguay, Brazil, Argentina, and Bolivia. Quechua, of the Quechumaran group, is spoken by some 8,000,000 people in Peru, Ecuador, Colombia, Bolivia, Argentina, and Chile. Some Quechua dialects are so divergent that they might be regarded as separate languages. The other Quechumaran language group, Aymaran, is spoken by more than 1,000,000 people in Peru and Bolivia. Most other languages in the Andean-Equatorial phylum are spoken by only a few thousand persons. (see also Index: Guaraní language, Aymaran languages)
The Ge-Pano-Carib phylum includes almost as many languages still spoken as the languages of the Andean-Equatorial phylum, but the former are all spoken by relatively small tribes, so that the total number of speakers of these languages is only a small fraction of the number of speakers of Andean-Equatorial languages. In terms of numbers of languages, the largest family in the Ge-Pano-Carib phylum is the Cariban (Carib) family, with some 60 languages still spoken in Venezuela, French Guiana, Guyana, Suriname, Brazil, and Colombia. Cariban languages were also formerly spoken in the Caribbean islands. Most Cariban languages have fewer than 1,000 speakers. The other large family in the phylum, the Macro-Ge family, includes more than 25 languages in Brazil. (see also Index: Macro-Ge languages)
The languages of the Macro-Chibchan phylum, of which some 39 may still be spoken, are distributed from Guatemala and Honduras southward into, and possibly beyond, Peru. The largest component of the phylum is the Chibchan family, of which 16 languages are still spoken from Nicaragua to northwestern Colombia--these include Cuna, spoken on the San Blas Archipelago of Panama as well as on the mainland of Panama and Colombia; Guaymí in Panama; and Páez in Colombia. (see also Index: Chibchan languages)
For further information on the Indian languages of the Americas, see below Languages of the Americas: North American Indian languages; Meso-American Indian languages; South American Indian languages.
For information on numbers of speakers by country, see the Britannica World Data: Language section in the BRITANNICA BOOK OF THE YEAR. (C.F.V. /F.M.V./Ed.)
Iranian languages were spoken in the 1st millennium BC in present-day Iran and Afghanistan and also in the steppes to the north, from modern Hungary to East (Chinese) Turkistan. The only well-known ancient varieties of Iranian languages are Avestan, the sacred language of the Zoroastrians (Parsis), and Old Persian, the official language of Darius I (ruled 522-486 BC) and Xerxes I (486-465 BC) and their successors. Among the modern Iranian languages are Persian (Farsi), Pashto (Afghan), Kurdish, and Ossetic. For more information, see Indo-Iranian languages. (see also Index: Avestan language)
The earliest Slavic texts, written in a dialect called Old Church Slavonic, date from the 9th century AD; the oldest substantial material in Baltic dates to the end of the 14th century, and the oldest connected texts to the 16th century. For more information, see Baltic languages and Slavic languages.
In addition to the principal branches just listed, there are several poorly documented extinct languages of which enough is known to be sure that they were Indo-European and that they did not belong in any of the groups enumerated above (e.g., Phrygian, Macedonian). Of a few, too little is known to be sure whether they were Indo-European or not (e.g., Ligurian).
Table 1 gives examples of typical vocabulary items widely shared within the Indo-European family that have been decisive in establishing the family. A blank indicates that the language in question does not use the item in accordance with the given meaning or that its word for that meaning is unknown.
Similarities in grammatical endings are shown in Table 2 by samples of noun declension and verb inflection in some of the more archaic languages that have retained the inflectional endings of Indo-European in relatively unchanged form. Note that Old Lithuanian -i and -u were nasalized vowels, representing a continuation from the earlier forms *-in and *-un. (The asterisk marks a form that is not actually found in any document or living dialect but is reconstructed as having once existed in the prehistory of the language.) (see also Index: Lithuanian language)
The statable phonetic rules referred to earlier are not always obvious without careful observation. Note that the English dental consonants t, d, and th do not correspond in a straightforward manner to the Greek dental sounds t, d, and th; that is, English t does not occur where Greek t appears, nor English d where Greek has d. But the relationships between the sounds are not random either -- English t does not correspond to Greek t in one word, to d in a second, and to th in a third, according to no discernible pattern. Rather, where Greek has initial t, English has th, as in that and three; where Greek has d, English has t, as in tree, two, and ten; and where Greek has th, English has d, as in daughter. Note also that phonetic similarity as such is not needed to establish relationship. Thus, many of the Armenian words in Table 1 look quite different from the related words in other Indo-European languages, but here too regular rules of correspondence can be found; e.g., Greek initial p corresponds to Armenian h or zero (lack of a consonant) in the words meaning 'fire,' 'father,' 'foot,' and 'five.' (see also Index: English language)
a stronger affinity, both in the roots of verbs, and in the forms of grammar, than could possibly have been produced by accident; so strong, indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists. There is a similar reason, though not quite so forcible, for supposing that both the Gothick [i.e., Germanic] and the Celtick, though blended with a very different idiom, had the same origin with the Sanscrit; and the old Persian might be added to the same family . . . .
Nineteenth-century linguists firmly established the connections that Jones had elucidated and broadened the family to include Slavic, Baltic, and other language groups. In 1816 Franz Bopp, the German philologist, presented his Über das Conjugationssystem der Sanskritsprache in Vergleichung mit jenem der griechischen, lateinischen, persischen und germanischen Sprache ("On the System of Conjugation in Sanskrit, in Comparison with Those of Greek, Latin, Persian, and Germanic"), in which the relation of these five languages was demonstrated on the basis of a detailed comparison of verb morphology (structure). Two years later there appeared the Undersøgelse om det gamle Nordiske eller Islandske Sprogs Oprindelse (Investigation of the Origin of the Old Norse or Icelandic Language), by the Danish philologist Rasmus Rask, completed in 1814. This work demonstrated methodically the relation of Germanic to Latin, Greek, Slavic, and Baltic. (Rask included Celtic a few years later.) In 1822 the second edition of the first volume of Jacob Grimm's Deutsche Grammatik ("Germanic Grammar") was published; in this grammar were discussed the peculiar Indo-European vowel alternations called Ablaut by Grimm (e.g., English "sing, sang, sung"; or Greek peíth-o 'I persuade,' pé-poith-a 'I am persuaded,' é-pith-on 'I persuaded'). In addition, Grimm tried to find the principle behind the correspondences of Germanic stop and spirant consonants (the first made with complete stoppage of the breath, and the second made with constriction of the breath but not complete stoppage) to the consonants of other Indo-European languages. The sound changes implied by these correspondences have become known as Grimm's law. Examples of it include the stop consonant p in Latin pater corresponding to the spirant consonant f in father, and the correspondences between English and Greek t, d, and th discussed above. (see also Index: Grimm's law)
Bopp demonstrated in 1839 that the Celtic languages were Indo-European, as had been asserted by Jones. In 1850 the German philologist August Schleicher did the same for Albanian, and in 1877 another German philologist, Heinrich Hübschmann, showed that Armenian was an independent branch of Indo-European, rather than a member of the Iranian subbranch. Since then, the Indo-European family has been enlarged by the discovery of Tocharian and of Hittite and the other Anatolian languages, and by the recognition, with the aid of Hittite, that Lycian, known and partly deciphered already in the 19th century, belongs to the Anatolian branch of Indo-European.
The Indo-European character of Tocharian was announced by the German scholars Emil Sieg and Wilhelm Siegling in 1908. The Norwegian Assyriologist Jørgen Alexander Knudtzon recognized Hittite as Indo-European on the basis of two letters found in Egypt (translated in Die zwei Arzawa-briefe [1902; "The Two Arzawa Letters"]), but his views were not generally accepted until 1915, when Bedrich Hrozný published the first report of his own decipherment of the much more copious material that had meanwhile been found in the ruins of the Hittite capital itself.
The first full comparative grammar of the major Indo-European languages was Bopp's Vergleichende Grammatik des Sanskrit, Zend, Griechischen, Lateinischen, Litthauischen, Altslawischen, Gotischen und Deutschen (1833-52; "Comparative Grammar of Sanskrit, Zend, Greek, Latin, Lithuanian, Old Slavic, Gothic, and German"). But this and August Schleicher's shorter Compendium der vergleichenden Grammatik der indogermanischen Sprachen (1861-62; "Compendium of the Comparative Grammar of the Indo-European Languages") were rendered obsolete by the major breakthrough of the 1870s, when scholars--prompted largely by the discoveries of a group of German scholars known as Neogrammarians--realized that sound correspondences are not merely rules of thumb that do not have to be strictly observed, but that apparent exceptions to sound laws can often be accounted for by stating them more accurately or by reconstructing additional different sounds in the parent language. The difference between Gothic d in fadar 'father' and þ in broþar 'brother,' for example, both corresponding to t in Sanskrit, Greek, and Latin, proved to be correlated with the original position of the accent, a discovery known as Verner's law (named for the Danish linguist Karl Verner). Thus, d appears when the preceding syllable was originally unaccented (fadar : Greek patér-, Sanskrit pitár- ), and þ occurs when the preceding syllable was originally accented (broþar : Greek phrater- 'member of a clan,' Sanskrit bhratar-). (see also Index: Verner's law)
The knowledge and opinions that had accumulated by the end of the 19th century are largely incorporated in the German linguist Karl Brugmann's Grundriss der vergleichenden Grammatik der indogermanischen Sprachen (2nd ed., 1897-1916; "Outline of Comparative Indo-European Grammar"), which remains the latest full-scale treatment of the family. (see also Index: "Outline of the Comparative Grammar of the Indo-European Languages," )
A "labial" sound is made with the lips, and a "dental" sound with the tip of the tongue against the back of the teeth. The "palatal" and "velar" sounds were probably made by contact between the back of the tongue and the soft palate--more toward the front of the mouth in the case of the palatals and more toward the back in the case of the velars (compare Arabic kalb 'dog' versus qalb 'heart'). The "labiovelar" sounds were made by contact between the back of the tongue and the soft palate with concomitant rounding of the lips. "Voiceless" designates sounds made without vibration of the vocal cords; "voiced" sounds are pronounced with vibration of the vocal cords. The exact pronunciation of the "voiced aspirates" is somewhat uncertain; they were probably similar to the sounds transcribed bh, dh, and gh in Hindi.
Correspondences pointing to the voiced labial stop b are rare, leading some scholars to deny that b existed at all in the parent language. A minority view holds that the traditionally reconstructed voiced stops were actually glottalized sounds produced with accompanying closure of the vocal cords. The status of the velar stops k, g, and gh has likewise been questioned. The earlier view that Proto-Indo-European had a series of voiceless aspirated stops ph, th, kh, kh, and kwh has largely been abandoned. (Aspirated consonants are sounds accompanied by a puff of breath.) There was one sibilant consonant, s, with a voiced alternant, z, that occurred automatically next to voiced stops. The existence of a second apical spirant, þ (presumed pronunciation like that of th in English thin), is extremely uncertain.
There is general agreement that Proto-Indo-European had one or more additional consonants, for which the label "laryngeal" is used. These consonants, however, have mostly disappeared or have become identical with other sounds in the recorded Indo-European languages, so that their former existence has had to be deduced mainly from their effects on neighbouring sounds. Hence, the laryngeal sounds were not suspected until 1878, and even then they were rejected by most scholars until after 1927, when the Polish linguist Jerzy Kurylowicz showed that Hittite often has h (perhaps a velar spirant like the ch in German ach) in places where a laryngeal had been posited on the evidence of the other Indo-European languages. There is still considerable disagreement about how many laryngeals there were, what they sounded like, what traces they left, and how best to symbolize them. Most scholars now believe there were three, which can be written H1, H2, and H3. Of these, H1 may have been h or a glottal stop; H2 was perhaps a pharyngeal spirant like Arabic h in hams 'five'; H3, whatever its other features, was probably voiced. The principal traces they left outside Anatolian are in the quality and length of neighbouring vowels, H2 changing a neighbouring e to a, and probably H3 changing it to o, while all laryngeals lengthened a preceding vowel in the same syllable. In Anatolian, H2 and H3 remained as h, at least in some positions. (see also Index: laryngeal consonant)
When laryngeals between consonants disappeared, a vowel sometimes
remained, as in Greek stásis, Sanskrit sthitis,
Old English stede 'a standing (place)' from Proto-Indo-European
*stH2tis. Before the advent
of the laryngeal theory, a separate Proto-Indo-European vowel
(called
schwa indogermanicum) was reconstructed to account for these correspondences.
Finally, there were the nasal sounds n and m, the liquids l and r, and the semivowels y and w. When y and w occurred between consonants, they were replaced by the vowels i and u. The nasals and liquids functioning as nuclei of syllables in this position (like the final sounds of English bottom, button, bottle, butter) are traditionally written n, m, l, r. Some scholars dispense with these diacritical marks and with the distinction between syllabic i and u and nonsyllabic y and w, but this obscures certain distinctions, such as that between -wn- in *kwnsu 'among dogs,' Sanskrit shvasu, and -un- in *tund- 'shove,' Sanskrit tundate.
In forming front vowels, the highest point of the tongue is in the front of the mouth; for back vowels, that point is in the back. High vowels are those in which the tongue is highest--closest to the roof of the mouth; mid vowels are made with the tongue between the extremes of high and low.
The four mid vowels participated in a pattern of alternation called "ablaut." In the course of inflection and word formation roots and suffixes could appear in the "e-grade" (also called "normal grade"; compare Latin ped-is 'of a foot' [genitive singular]), "o-grade" (e.g., Greek pód-es 'feet'), "zero-grade" (e.g., Avestan fra-bd-a- 'forefoot,' with -bd- from *-pd-), "lengthened e-grade" (e.g., Latin pes 'foot' [nominative singular] from *ped-s), and/or "lengthened o-grade" (e.g., English foot, Old English fot).
There is some evidence for a similar pattern of alternation involving a, a, and zero. Most instances of apparent a and a, however, arose by "coloration" of e under the influence of a preceding or following H2 (e.g., Greek ag- 'lead' comes from *H2eg-, sta- 'stand' comes from *stH2-). Some cases of o, o, and e are likewise of laryngeal origin (e.g., Greek op- 'see' comes from *H3ekw-, do- 'give' comes from *deH3-, the- 'put' comes from *dheH1-). Among the high vowels, i and u did not participate in ablaut alternations but rather functioned primarily as the syllabic realizations of the consonants y and w, as in *leykw- 'leave,' zero-grade *likw-, parallel to *derk- 'see,' zero-grade *drk-. Long i and u in the recorded languages derive in large part from sequences of i or u plus laryngeal, as in Latin vivus 'alive' from *gwiH3wós.
The accent just before the breakup of the parent language was apparently mainly one of pitch rather than stress. Each full word had one accented syllable, presumably pronounced on a higher pitch than the others.
The imperfective aspect, traditionally called "present," was used for repeated actions and for ongoing processes or states--e.g., *stí-stH2-(e)- 'stand up more than once, be in the process of standing up,' *mn-yé- 'ponder, think,' *H1es- 'be.' The perfective aspect, traditionally called "aorist," expressed a single, completed occurrence of an action or process--e.g., *steH2- 'stand up, come to a stop,' *men- 'think of, bring to mind.' The stative aspect, traditionally called "perfect," described states of the subject--e.g., *ste-stóH2- 'be in a standing position,' *me-món- 'have in mind.'
Verb roots were by themselves either perfective (like *steH2- 'stand' and *men- 'think') or imperfective (like *H1es- 'be'). This basic aspect, however, could be reversed by morphological devices such as ablaut, suffixation, and reduplication. The stative aspect was normally marked by reduplication and the o-grade of the root in the indicative singular; it had personal endings that were partly distinct from those of the other two aspects.
From one aspect of a given verb the shape and even the existence of the other two aspects could not be predicted; for example, *H1es- 'be' had only the imperfective aspect. Ways of forming imperfectives were especially numerous and often involved, in addition to their imperfective aspectual meaning, some other notion, such as performing the action habitually or repeatedly (iterative), or causing someone else to perform it (causative). One root could thus have several imperfective stems; so to the root *H1er- 'move' there were at least a causative form, *H1r-new- 'set in motion,' and an iterative form, *H1r-ske- 'go repeatedly.'
The Proto-Indo-European verb was also inflected for mood, by which the speaker could indicate whether he was making statements or inquiries about matters of fact; making predictions, surmises, or wishes about the future or about unreal but imagined situations; or giving commands. Compare English "If John is home now (he is eating lunch)" with the verb is in the indicative mood, discussing a matter of fact, with "If John were home now (he would be eating lunch)" with the verb were in the subjunctive mood, describing an unreal situation. There were two Proto-Indo-European suffixes expressing mood: -e- alternating with -o- for the subjunctive, corresponding roughly in meaning to the English auxiliaries 'shall' and 'will,' and -yeH1- alternating with -iH1- for the optative, corresponding roughly to English 'should' and 'would.' Verbs without one of these two suffixes were marked for mood and tense by their personal endings alone.
These personal endings basically expressed the person and number of the verb's subject, as in Latin amo 'I love,' amas 'you (singular) love,' amat 'he or she loves,' amamus 'we love,' and so on. In the imperfective and perfective aspects there were two sets of endings, distinguishing two voices: active, in which typically the subject was not affected by the action, and mediopassive, in which typically the subject was affected, directly or indirectly. Thus Sanskrit active yájati and mediopassive yájate both mean 'he sacrifices,' but the former is said of a priest who performs a sacrifice for the benefit of another, while the latter is said of a layman who hires a priest to perform a sacrifice for him. In the stative aspect there was originally no distinction of voice. (see also Index: active voice)
To mark mood and tense, imperfective verbs that did not have a mood suffix distinguished three subtypes of active and mediopassive endings: imperative, primary, and secondary. Verbs with imperative endings belonged to the imperative mood (used for commands)--e.g., *H1s-dhí 'be (singular),' *H1és-tu 'let him be.' Verbs with primary endings were marked as non-past (present or future) in tense and indicative in mood--e.g., *H1és-ti 'he is.' (Indicative mood signifies objective statements and questions.) Verbs with secondary endings were unmarked for tense and mood but were normally used as past indicatives (e.g., *H1és-t 'he was,' *gwhén-t 'he slew') and to fill out gaps in the imperative paradigm (e.g., *H1és-te or *H1s-té 'you [plural] were,' but also 'be [plural]'; *gwhén-te or *gwhn-té 'you [plural] slew,' but also 'slay [plural]'). To mark such forms unambiguously as past indicatives, an augment, usually consisting of the vowel e, could be prefixed--e.g., *é-gwhen-t 'he slew,' *é-H1es-t 'he was.'
Verbs in the perfective aspect without a mood suffix did not occur with primary endings and thus lacked a true present tense. Verbs in the stative aspect substituted a distinctive set of endings for those of the primary set but apparently used the imperative and secondary endings in the usual way to form a stative imperative and a stative past indicative.
Adjectives were nounlike words that varied in gender according to the gender of another noun with which they were in agreement, or, if used by themselves, according to the sex of the entity to which they referred; thus, Latin bonus sermo 'good speech' (masculine), bona aetas 'good age' (feminine), bonum cor 'good heart' (neuter), or bonus 'a good man,' bona 'a good woman,' bonum 'a good thing.' The neuter of an adjective was often identical with the masculine except for having different endings in the nominative and accusative cases. Feminine gender was either completely identical with the masculine or derived from it by means of a suffix, the two commonest being *-eH2- and *-iH2- (*-yeH2-).
Demonstrative, interrogative, relative, and indefinite pronouns were inflected like adjectives, with some special endings. Personal pronouns were inflected very differently. They lacked the category of gender, and they marked number and case (in part) not by endings but by different stems, as is still seen in English singular nominative "I," but oblique "my," "me"; plural nominative "we," but plural oblique "our," "us." (The oblique is any case other than nominative or vocative.)
Thus is it supposed that the Proto-Indo-European community knew and talked about dogs (*kwón-), horses (*H1ékwo-), sheep (*H3éwi-), and almost certainly cows (*gwów-) and pigs (*súH-). Probably all these animals were domesticated. At least one cereal grain was known (*yéwo-), and at least one metal (*H2éyos). There were vehicles (*wógho-) with wheels (*kwékwlo-), pulled by teams joined by yokes (*yugó-). Honey was known, and it probably formed the basis of an alcoholic drink (*mélit- , *médhu) related to the English mead. Numerals up through 100 (*kmtóm) were in use. All this suggests a people with a well-developed Neolithic (characterized by simple agriculture and polished stone tools) or even Chalcolithic (copper- or bronze-using) technology.
For further progress the linguistic findings must be correlated with archaeological evidence. Linguistic, historical, and geographic considerations suggest that the speakers of Proto-Indo-European were a relatively small and homogeneous Eurasian population group that underwent significant expansion and fragmentation in the period around 4000 BC. Some scholars believe that the Indo-Europeans were the bearers of the Kurgan (Barrow) culture of the Black Sea and the Caucasus and west of the Urals. (see also Index: Kurgan culture)
The Kurgan culture, however, was only one of a number of related steppe cultures extending across the entire Black Sea-Caspian Sea region, an area that was transformed about 4000 BC by the advent of horse-drawn wheeled vehicles and related innovations. It is probably best, therefore, to follow J.T. Mallory (In Search of the Indo-Europeans [1989]) in locating the speakers of Proto-Indo-European among the populations of this region, but not to attempt a more precise identification until further evidence is available.
Remote relationship of Indo-European to the Uralic languages is not improbable. Geographically, the earliest reconstructible locations of the two families are contiguous; lexically, there are strong resemblances in a number of basic words or word parts, including personal, demonstrative, interrogative, and relative pronouns, personal endings of verbs, the accusative case ending -m, and such words as those for 'water' and 'name'; typologically, the families are fairly similar--e.g., both have many suffixes, but few or no prefixes or infixes (elements inserted within words). The resemblances, however, are too few to permit the reconstruction of a common "Indo-Uralic" parent language; the two families, if they are related at all, must have separated thousands of years before the breakup of Proto-Indo-European.
If Indo-European is related to other language families--e.g., to Afro-Asiatic (which includes the Semitic languages) or to Kartvelian (which includes Georgian)--it must have diverged from them much earlier than it diverged from Uralic, because the number of cogent resemblances is much smaller. There is no significant evidence at present for a "Nostratic" superfamily embracing these and other groups.
m 'hundred' (Proto-Indo-European *kmtóm),
which illustrates the change. The languages that preserve the
palatal stops as k-like sounds are known as
"centum" languages, from centum (/kentum/), the corresponding
word in Latin. The satem languages are not geographically separated
from one another by any recorded languages that preserve the
palatals as stops; it is therefore inferred that the change
to affricates (whence later spirants) occurred just once and
spread over a cohesive dialect area of Proto-Indo-European.
Of the languages that share this change, however, Balto-Slavic shares with Germanic (including English) an m in certain case endings where other Indo-European languages, including Indo-Iranian, Armenian, and Albanian, have bh or a sound regularly developed from bh. Examples of the m ending include English the-m and Old Church Slavonic te-mu 'to those ones'; the bh and related sounds (ph, v, b) are illustrated in the following: Sanskrit té-bhyas 'to those ones,' Armenian noro-vk' 'with new ones,' Albanian male-ve 'to mountains,' Greek ókhes-phin 'with chariots,' Latin omni-bus 'for all.' Because Balto-Slavic and Germanic are neighbours, it is inferred that m replaced bh in these case endings just once in the parent language and that the area over which this innovation spread only partly overlapped the area that adopted affricated pronunciation of the palatals.
This pattern is general for changes dating from the time the parent language was breaking up into distinct languages. Each of the resulting languages shares some innovations with some of its neighbours, but only rarely do different innovations shared by two or more branches of Indo-European cover exactly the same territory.
Once the dialects had become differentiated enough to be distinct languages--certainly by 2500 BC in most cases--each largely went its own way, and agreements in developments since then are due either to borrowing across language boundaries (as in the notable convergences between Modern Greek, Albanian, Romanian, and the southernmost Slavic languages) or to parallel but independent workings out of the same base material.
In phonology, the most striking changes have been loss or reduction in many languages of final or unaccented syllables, and loss in several languages of certain consonants between vowels, often followed by contraction of the resulting vowel sequence. Thus words in modern Indo-European languages are often much shorter than their Proto-Indo-European ancestors--e.g., English 'four,' Armenian c'ork', colloquial Persian car 'four' from *kwetwóres; French vit (pronounced vi) 'lives' from *gw íH3weti; Russian dvestí 'two hundred' from *duwóy H1 kmtóyH1.
In the verb, where more endings originally had two syllables, loss of final syllables has had less serious consequences for morphology. Even here, however, some languages, including English, have totally or almost totally given up the marking of subject by personal endings. Compare English "I, we, you, they love" and "he, she loves" with the Spanish conjugation for 'love'--amo, amas, ama, amamos, amáis, aman--or the Russian version--ljubljú, ljúbish, ljúbit, ljúbim, ljúbite, ljúbjat.
Changes in noun inflection have generally involved simplification. Almost everywhere the dual number has been lost; in many languages the noun genders have been reduced from three to two (as in French, Swedish, Lithuanian, and Hindi) or lost entirely (as in English, Armenian, and Bengali). Only Slavic has complicated the gender system by imposing on the inherited distinctions contrasts of animate versus inanimate or of personal versus nonpersonal. (see also Index: Slavic languages)
Everywhere except in the oldest Indo-Iranian languages the original eight Indo-European cases have suffered reduction. Proto-Germanic had only six cases, the functions of ablative (place from which) and locative (place in which) being taken over by constructions of preposition plus the dative case. In Modern English these are reduced to two cases in nouns, a general case that does duty for the vocative, nominative, dative, and accusative ("Henry, did Bill give John the letter?") and a possessive case continuing the old genitive ("Bill's letter"). In languages such as French and Welsh, nouns are no longer inflected for case at all. In some languages, to be sure, nouns have begun fusing with words placed directly after the nouns to create new case systems, coexisting with relics of the old. Thus, Old Lithuanian had in addition to seven inherited cases an illative (place into), made by adding -n(a) to the accusative (peklosna 'into hell'), an allative (place to, toward), made by adding -p(i) to the genitive (Jesausp 'to Jesus'), and an adessive (place at which), made by adding -p(i) to the locative (Joniep 'in John'). (see also Index: English language)
Changes in the verb have been more complex. Besides loss or merger of old categories, many new forms have been created and many old forms have acquired new values. In Ancient Greek the focus of the stative aspect (perfect) has largely shifted from the present state ("he is dead") to the previous event that led to this state ("he has died"). As a result, the perfect came to mean the same as the perfective past (aorist), and it has therefore disappeared from Modern Greek. New forms created in Ancient Greek include future and future perfect tenses, based on the desiderative present forms (such as "he wants to walk") of the parent language.
In Germanic the principal new creation was the weak past tense (ending in a t or d), such as English loved, thought, German liebte, dachte, made by combining the verb stem with a past tense of the Germanic verb for 'do.' (The strong past tense formed by vowel alternations, like "sing, sang," "run, ran" comes from the Proto-Indo-European stative aspect.) (see also Index: Germanic languages)
In some languages participles have come to function as finite verbs. Thus in Hindi admi larki-ko dekhta 'the man sees the girl,' dekhta 'sees' is etymologically a participle 'seeing,' agreeing in number and gender with the subject admi 'man.' In the past tense, admi-ne larki dekhi 'the man saw the girl,' the verb dekhi is etymologically a past passive participle 'seen,' agreeing in gender and number with the object larki 'girl,' and the subject is marked with an instrumental ending. (see also Index: Hindi language)
The influence of non-Indo-European languages on the sounds and grammar of Proto-Indo-European is not demonstrable, partly because there is no direct evidence about the languages that were in contact with Indo-European before roughly 3000 BC. It can be surmised, however, that some words are loans--e.g., *péleku- 'ax,' a word for an object likely to be imported or learned of from neighbours with superior technology and which is not analyzable into a known Indo-European root plus a known Indo-European suffix.
When Indo-European languages have been carried within historic times into areas occupied by speakers of other languages, they have generally taken over a number of loanwords, as with English and Spanish in the Americas or Dutch in South Africa. Aside from the special case of pidgin and creole languages, however, there has been comparatively little effect on sounds and grammar. These have been significantly affected within historic times only when an Indo-European language has been spoken in prolonged close contact with non-Indo-European speakers, as with Ossetic (an Iranian language) in the Caucasus, or when its speakers have been very strongly influenced culturally by speakers of a non-Indo-European language, as with Persian, in which Arabic plays much the same role as Latin does in English.
In prehistoric times most branches of Indo-European were carried into territories presumably or certainly occupied by speakers of non-Indo-European languages, and it is reasonable to suppose that these languages had some effect on the speech of the newcomers. For the lexicon, this is indeed demonstrable in Hittite and Greek, at least. It is much less clear, however, that these non-Indo-European languages affected significantly the sounds and grammar of the Indo-European languages that replaced them. Perhaps the best case is India, where certain grammatical features shared by Indo-European and Dravidian languages appear to have spread from Dravidian to Indo-European rather than vice versa. For most other branches of Indo-European languages any attempt to claim prehistoric influence of non-Indo-European languages on sounds and grammar is rendered almost impossible because of ignorance of the non-Indo-European languages with which they might have been in contact. (W.C. /J.H.Ja.)
Hattic (or Hattian), also misleadingly called Proto-Hittite, is the best-known substratum language. It is completely unrelated to Hittite and its sister languages as well as to Hurrian, a language also spoken in Anatolia.
The Anatolian group of Indo-European languages consists of Hittite, Palaic, Luwian, Hieroglyphic Luwian, Lydian, and Lycian. Hittite, Palaic, and Luwian are known from 2nd-millennium cuneiform texts found in the excavations in Bogazköy-Hattusa since 1905; Hieroglyphic Luwian is found on scattered inscriptions and seals from Anatolia (mainly the southern area) and northern Syria dating mainly from later times (i.e., between c. 1200 and 700 BC, although there are earlier examples from the empire period, c. 1400-c. 1190 BC). Lydian and Lycian are known from texts in alphabetic script from c. 600 to 200 BC. It seems fairly reasonable to add the Carian language of southwest Anatolia to this list as well as other less well documented languages like Sidetic. More to the east, in the Caucasus region centring around Lake Van, Hurrian of the 3rd and 2nd millennia BC was replaced in the 1st millennium BC by the related Urartian language. Both of these languages are definitely non-Indo-European. (see also Index: Bogazköy)
There is a tendency among linguists to postulate an eastern route of entry into Anatolia by way of the Caucasus, because certain grammatical features--e.g., the loss of the feminine gender--might be explained as having been caused by prolonged contacts with Caucasian languages. It is likely that the Indo-European forebears of the later speakers of Hittite, Palaic, Luwian, and Lydian entered Anatolia together, following a common route, because the Anatolian languages share a considerable number of losses as well as innovations that presuppose a long common past.
In the central parts of Anatolia, within the bend of the Halys River (modern Turkish, Kizil Irmak), and in the northern regions, Hittite and Palaic were profoundly influenced by Hattic as a substratum language. The Hattian culture also changed the political and religious concepts of the newcomers, and a clear cultural dependency of the Indo-Europeans on the older Hattian population is evident. Some scholars have stressed the likelihood that farther to the south the Luwians might have been conversant with a different substratum. In view of the absence of textual evidence, and because knowledge of the Luwian vocabulary is rather restricted, it is perhaps not surprising that this possible substratum element escapes definition. (For the history of Anatolia in the 2nd and 1st millennia BC, see TURKEY AND ANCIENT ANATOLIA: Ancient Anatolia.) (see also Index: Luwian language)
The most important invaders of Anatolia in the "Dark Age" (after 1190 BC) were the Phrygians. Their language is definitely Indo-European, but it bears no relationship to the Anatolian subgroup. Rather, it seems akin to Thracian, Illyrian, or possibly Greek. Greek, in the second half of the 1st millennium BC, and, later, Latin, from the 2nd century onward, entered central Anatolia as languages of a ruling caste. Much earlier--beginning in Mycenaean times--the west coast had attracted Greek settlers. In the first half of the 1st millennium, the southern and northern shores also attracted Greek-speaking peoples. To the east in the Caucasus region, other Indo-Europeans, the Armenian-speaking invaders, penetrated into the former Urartian territory well before the beginning of the Persian period, probably in the 7th and 6th centuries BC. During Persian times, a Persian ruling caste entered eastern and also northeastern Anatolia and was still clearly recognizable in the Hellenistic and Roman periods (e.g., in Bithynia, Pontus, Cappadocia, and Commagene). Late data on names and scattered remarks made by Fathers of the Church indicate that until late Roman and perhaps even Byzantine times, some Anatolian dialects remained in use in certain isolated parts of the interior. (see also Index: Iranian languages)
|