Languages of the World

Introduction

Languages may be classified either genetically or typologically. A genetic classification assumes that certain languages are related in that they have evolved from a common ancestral language. This form of classification employs ancient records (such as those for Latin) as well as hypothetical reconstructions of the earlier forms of languages, called protolanguages. Because information on the genetic affiliations of languages is sufficiently extensive, world surveys of languages are necessarily oriented in that way--sometimes exclusively so and sometimes in conjunction with typological classifications. Typological classification is based on similarities in language structure. Individual frames of reference in language typology are not known well enough to permit a worldwide typological classification.

Before the conclusive demonstration that unwritten languages could be classified genetically, they were often relegated to a typological classification, which at one time was denigrated by scholars. Since 1917, however, the prestige of some kinds of typology has risen--in particular, that of grammatical typology. The best-known typological frame of reference represents the grammar of a language, either as a whole or as a subsystem. Once a genetic classification has been established, typological classification may be superimposed on it in order to show change of language type--as from a predominantly inflectional language (such as Proto-Germanic) to a predominantly isolating one (such as modern English)--or to show features that are shared by languages in neighbouring branches in the same family (e.g., Celtic and Germanic in Indo-European). The ultimate grammatical typology is that which treats subsystems that are, in some sense, universal to all human languages.

Lexical typologies, based on similarities in vocabulary structure, have been used in cognitive anthropology and psycholinguistics (e.g., perception of colours and use of colour terms). The sociolinguistic frame of reference in typology provides classifications for varieties of language in terms of their functions and their ways of identifying social groups and cultural spaces; in addition, it brings order and integration to problems concerning national standards that are faced by new nations that have many nonstandard and unwritten languages as well as languages that make use of writing.

A few points of terminology should be explained before further discussion of the world's languages is afforded. Language family is the label often used for a conservative genetic classification, one that can be attested only when an abundance of cognates (related words) is available. Phylum is the label for a liberal genetic classification that is attested with fewer cognates; it encompasses language families. Although a given phylum will have greater extension than any of the families included in it, only fragments of phonology will be reconstructible in the protolanguage. In actual linguistic usage, however, the term family is often employed to refer to a group that is technically a phylum--e.g., the Afro-Asiatic (Hamito-Semitic) family, the Sino-Tibetan family.

The label language isolate is used for a language that is the only representative of a language family, as Basque or the extinct Sumerian language; the presumptive but unknown sister languages of isolates are dead and unrecorded. A language isolate may be classified, along with normal language families, under the rubric of an extensive phylum (e.g., Korean is sometimes classified as a member of a hypothetical Ural-Altaic phylum) or left wholly unclassified (e.g., the Ainu language of Japan). The label pidgin-creole is used for a language that has had so much vocabulary change that cognates for reconstructing the protolanguage from which it descended cannot be found. A pidgin is a contact language used for communication between groups having different native languages. When a pidgin becomes the native language of a community it is customarily called a creole.

This article begins with a survey of world languages based on geographic regions of unequal size: huge and sprawling areas for the peripheral regions of Africa, Oceania, and the Americas but relatively compact areas for the focal regions within Eurasia. Nine regions--six in Eurasia, in all of which writing and standard languages are widespread--constitute a convenient basis for comparison and contrast. The larger part of the article consists of more detailed examinations of the languages of the world arranged by genetic affinities.

Facets of the subject of language and human communication are treated in a variety of articles. For a full account of the theory and methods of linguistic science, see the article LINGUISTICS. For information on such subjects as the characteristics of language, language variants (slang, jargon), speech production, and the acquisition of language, see the article LANGUAGE. For a full account of phonetics and the pathology of speech, see the article SPEECH. For information on written languages and writing systems, see the article WRITING.

For coverage of related topics, see SPECTRUM, section 514, and the Index.


Languages of the World: Table of Contents

Table of Contents


Languages of the World: INTRODUCTORY SURVEY

INTRODUCTORY SURVEY

LANGUAGES OF EUROPE

The great majority of the languages spoken in Europe are of Indo-European and Uralic (especially Finno-Ugric) affiliation. In terms of numbers of speakers, however, the people in Europe who speak the languages of these families are now fewer than those in non-European countries who also speak such languages. For example, Latin America (rather than Europe) is now the chief locus of the Spanish language.

An unusually small degree of genetic diversity is found among European languages: there are fewer language families in Europe than in any other continent-size region of the world. In addition, literary traditions that have resulted in the preservation of earlier forms of present-day languages are found to a high degree among these languages. Every European language with a writing tradition has developed at least one standard that is recognized nationally, and the national standard often coexists with recognized regional standards.

A few European languages are used internationally, as lingua francas, but there is a low degree of pidgin and creole usage in Europe today.

Typological classifications have been superimposed on genetic classifications of European languages in particular. For example, the Italic branch of Indo-European languages may be grouped with the Greek, Celtic, and Germanic branches on the basis of certain structural features, as can Armenian with Greek and the Indo-Iranian languages, and so forth.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EUROPE: Indo-European languages.

Indo-European languages.

The languages of seven of the nine extant branches of the Indo-European language family are spoken in Europe. Variability in estimates of the number of particular languages reflects variation in the criteria used (e.g., mutual intelligibility between neighbouring dialects, known common or separate history versus sociocultural factors such as separate literary traditions or status as national languages of politically independent units) as well as in the time period for which the criteria are applied. Thus, for example, it is possible to say on linguistic grounds that there are nine extant languages in the Romance subgroup of the Italic branch: Portuguese, Spanish, Catalan, French, Romansh, Ladin, Friulian, Italian, and Romanian. In applying the criterion of separate literary tradition, the list would be expanded by the addition of Provençal and Sardinian. To apply the criterion of status as a national language would reduce the list because Ladin, Friulian, Provençal, and Sardinian are not national languages; but the picture is complicated by the fact that Sardinia was once politically independent, and Andorra, in which Catalan is spoken, has not always been independent. (see also Index: Romance languages, Sardinian language)

Similar difficulties in counting separate languages exist for all the branches in which several languages are spoken. In terms of areas of high mutual intelligibility (which do not entirely reflect historical development), there are only five modern Germanic languages: English, Frisian, Netherlandic-German (including Afrikaans and Yiddish), Insular Scandinavian, and Continental Scandinavian. If literary tradition and national criteria are considered, the number is increased by the division of Netherlandic-German into Standard High German, Low German, Dutch-Flemish, Afrikaans, Luxemburgian, and Yiddish; the division of Insular Scandinavian into Icelandic and Faroese (Faeroese); and the division of Continental Scandinavian into Norwegian (which is further subdivided into New Norwegian [or Nynorsk] and Dano-Norwegian [or Bokmål]), Danish, and Swedish.

For the Slavic languages there are 13 literary standards, but between the nuclei formed by these norms there are scarcely any linguistic boundaries, because transitional dialects connect adjacent areas. In terms of intelligibility and, to some extent, in terms of shared features, the Slavic literary norms can be grouped into three zones: East Slavic (including Russian, Belarusian, and Ukrainian), West Slavic (including Polish, Kashubian, Lower, or Low Sorbian, Upper, or High Sorbian, Czech, and Slovak), and South Slavic (including Slovene, Serbo-Croatian, Macedonian, and Bulgarian).

Language boundaries are more clear-cut for the modern living languages in the remaining European branches of Indo-European: Celtic (the physical separation of the speakers of the languages contributes to the separate identification of Welsh, Breton, Irish, and Scottish Gaelic), Baltic (the literary and political separation coincide with the separation of Lithuanian and Latvian), Greek (the separate historical development and lack of mutual intelligibility separate Modern Greek from Tsakonian), and Albanian (the political unity of Albania contributes to the single-language identification of Gheg and Tosk, its two divergent dialects). (see also Index: Celtic languages, Baltic languages, Greek language, Albanian language)

Dialects of two languages in the Indo-Iranian branch of Indo-European also are or were spoken in Europe: the Jassic dialect of Ossetic, an Iranian language, formerly spoken in Hungary; and the European dialects of Romany, which was spread by Gypsies throughout Europe and into America. It may be, however, that only in Wales, Finland, and the Balkans does Romany still serve as a native language, though Welsh and Finnish Romany have few speakers left. (see also Index: Ossetic language, Romany language)

A number of earlier Indo-European languages that died out without descendants are known from written records and comments by contemporaries; these include several in the Italic branch (such as Oscan, Umbrian, Faliscan, Venetic), at least four in the East Germanic group (including Burgundian, Ostrogothic, Visigothic, and Vandalic), and three in the Celtic branch (Gaulish, Cornish, and the recently extinct Manx). The classification as Indo-European or non-Indo-European for many other extinct languages (such as Pictish, spoken in what is now northern Scotland) remains uncertain because of the scarcity of data.

For more information on the Indo-European languages of Europe, see Greek language; Italic languages; Romance languages; Germanic languages; English language; Celtic languages; Baltic languages; Slavic languages; Albanian language.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EUROPE: Uralic languages.

Uralic languages.

In addition to the Indo-European languages, all but two of the languages of the Finno-Ugric branch of the Uralic family also are spoken in Europe. As in the case of Indo-European languages, variation exists in the enumeration of separate languages. For example, several varieties of Sami (Lapp) are mutually unintelligible but often are classified as dialects of a single language. The various types also have been classified according to geographic areas or national boundaries (Norwegian, Swedish, Finnish, and Russian Sami). In Baltic-Finnic, the Finno-Ugric subgroup most closely related to Sami, the Finnish, Karelian, Veps, Ingrian, Estonian, Livonian, and Votic languages often are linked by transitional dialects between the central areas of a given pair. The other Finnic languages, Mari and Mordvin, and all three languages of the Permic subgroup, Udmurt (formerly Votyak), Permyak, and Komi, are spoken much farther to the east, in the central area of eastern European Russia. (see also Index: Finno-Ugric languages, Sami language, Permic languages)

The Samoyedic branch of Uralic is represented by the Nenets, who live in northernmost Russia from the mouth of the Northern Dvina River eastward into North Asia. The remaining Uralic language of Europe, which belongs to the Ugric subgroup, is Hungarian. See Uralic languages. (see also Index: Nenets language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EUROPE: Other languages.

Other languages.

Maltese.
Maltese, which is spoken in Malta, is a Semitic language descended from a dialect of Arabic. It was so long isolated from other dialects of Arabic and so heavily influenced by Italian that the resultant loss of mutual intelligibility with other Arabic speakers justifies its usual classification as a separate Semitic language. (see also Index: Maltese language)

Basque.
Basque, spoken in the Pyrenees in Spain and France, is the only other living language of western Europe that does not belong to the Indo-European family. Numerous attempts to link Basque genetically to other languages have been inconclusive. See Basque language. (see also Index: Basque language)

Turkic languages.
In addition to Turkish, which is spoken by a number of people in Bulgaria and elsewhere in the Balkans, several languages of the Turkic language group (classified as a subfamily of the Altaic language family) are spoken entirely in eastern Europe. Chuvash, the most divergent Turkic language, is found mainly in Chuvashia in Russia; Tatar in Tatarstan and adjacent areas and in Romania and Bulgaria; Bashkir in Bashkortostan; Gagauz in Ukraine and Moldova and in the Balkans; and Karaim in southern Ukraine and Lithuania. Most or all of the speakers of Crimean Turkish were removed to the Uzbek S.S.R. (now Uzbekistan) after World War II, although since 1989 a number have returned to the Crimean Peninsula. See Altaic languages. (see also Index: Chuvash language, Tatar language, Bashkir language, Gagauz language, Karaim language)

Extinct languages.
The existence of a number of long-extinct non-Indo-European languages of Europe is known through the records of the Greeks and Romans and also through the preservation of varying amounts of written records. The most extensive records are those in the still undeciphered Etruscan, which is known to have been spoken in Italy from the 8th century BC to the 4th century AD (see Etruscan language). Several languages were spoken in the Iberian Peninsula, of which Iberian (preserved in a few inscriptions and many coins) was spoken along the Ebro River and at one time as far east as the Rhône River. (see also Index: Etruscan language, Iberian language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA

LANGUAGES OF SOUTH ASIA

The genetic classification of the languages of India, Sri Lanka, Bangladesh, Pakistan, Nepal, and Bhutan includes two subgroups-- Indo-Aryan (also called Indic) and Iranian--of a single branch of Indo-European (called Indo-Iranian), some indigenous language families (such as Dravidian), a few language isolates (such as Burushaski), and some Sino-Tibetan languages. (see also Index: Indo-Aryan languages, Iranian languages)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA: Indo-Iranian languages.

Indo-Iranian languages.

Except for Romany and the few Dardic languages spoken in Afghanistan, all the languages of the Indo-Aryan subgroup of the Indo-Iranian branch of Indo-European are spoken in South Asia. It is difficult to identify language boundaries in the Indo-Aryan group because, between any pair of literary standards, "transitional" dialects grade into one another, with no clear-cut language barriers. The problem is further complicated by the enormous dialect differentiation in most of the Indo-Aryan languages. In terms of lack of mutual intelligibility between literary standards, there are more than 20 Indo-Aryan languages. Although Sanskrit is a classical Indo-Aryan language, preserved in writing, it also enters so deeply into the vocabulary of present-day languages as to become, in some cases, the salient mark differentiating two dialects of one language. Thus, Hindi of India differs linguistically from Urdu of Pakistan chiefly in that the former may be heavily Sanskritized in vocabulary and the latter not. (see also Index: Sanskrit language, Hindi language, Urdu language)

Ethnolinguistic loyalties also may increase the number of languages distinguished--e.g., the separate recognition of Bengali and Assamese. Most of the Indo-Aryan languages are spoken by many millions of speakers--e.g., Bengali, Assamese, Hindi, Marathi, Maithili, Maghi, Bhojpuri, Gujarati, Oriya, Sinhalese (in Sri Lanka), Sindhi, and Nepali. There are also large numbers of speakers of Indo-Aryan languages (especially West Hindi) in South Africa, in various parts of the South Pacific, and in South America (particularly Suriname and Guyana).

The languages of the Dardic subgroup differ sufficiently from the other Indo-Aryan languages as to be sometimes classified as Iranian rather than Indo-Aryan or as a separate subbranch coordinate with the Indo-Aryan and Iranian subbranches. Kashmiri, spoken in Jammu and Kashmir, is the only Dardic language with a literary tradition. Shina also is spoken in Jammu and Kashmir; other Dardic languages, which are spoken mostly in Pakistan, have relatively few speakers. (see also Index: Kashmiri language, Shina language)

Some speakers of at least four Iranian languages also are found in South Asia, including Pashto speakers in Pakistan and Balochi (Baluchi) speakers in both Pakistan and India (see Indo-Iranian languages).


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA: Dravidian languages.

Dravidian languages.

Although the greatest concentrations of Dravidian speakers are in southern India, the more than 20 languages of this family are widespread in India, and one language, Brahui, is isolated in Pakistan, separated from its nearest sister language by 800 miles. Four Dravidian languages have long literary traditions and are spoken by many millions: Telugu, Tamil, Malayalam, and Kannada. Tamil speakers also are found in Sri Lanka, Malaysia, Indonesia, Myanmar (Burma), Vietnam, and South Africa and in scattered island and coastal areas around the world. Among other, less widespread Dravidian tongues of India are Gondi, Tulu, Kurukh, and Kui. No convincing remote relationships between the Dravidian family and other families have been proposed (see Dravidian languages). (see also Index: Tamil language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA: Austroasiatic languages.

Austroasiatic languages.

The 16 or so Munda languages are all spoken in India, Pakistan, Nepal, and Bangladesh. Most scholars classify them as a language family within the Austroasiatic stock. Santhali is the Munda language with the greatest number of speakers (a few million); Mundari, Ho, Sora, Kharia, and Korku have significantly fewer speakers. Some scholars include Nahali, spoken by a few thousand people in southwestern Madhya Pradesh, among the Munda languages. Khasi, spoken in Assam, Meghalaya, and a number of other Indian states, is a member of the Mon-Khmer language family. See Austroasiatic languages. (see also Index: Santhali language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA: Sino-Tibetan languages.

Sino-Tibetan languages.

Speakers of languages of most of the branches of the Sino-Tibetan language family are to be found in South Asia. All the languages of the Bodo (Bodo-Garo) branch of the Bodo-Naga-Kachin language group are spoken in Assam. Naga languages are spoken in scattered locations from eastern Nepal into Myanmar. The Kachin (Ching-p'o) languages are centred in northern Myanmar, but some dialects are spoken in Assam, where there are also some speakers of Kuki (Kuki-Chin) languages and of Burmese (Burmese-Lolo) languages. Dialects of the various divisions of the Tibetan language are distributed from Kashmir to Bhutan and southward into India (e.g., Balti, Sherpa, Lhoke, Spiti). Speakers of close to 50 Gyarung-Mishmi (or Himalayan) languages are found in northeastern India, with their greatest concentration in Nepal. (See Sino-Tibetan languages). (see also Index: Bodo-Garo languages, Kachin language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTH ASIA: Other languages.

Other languages.

Tai languages.
Some speakers of Khamti, a Tai language, live in Assam, where Ahom, another Tai language, is still used as a ceremonial language in religious rituals but is no longer spoken. See Tai languages. (see also Index: Ahom language)

Use of English.
In all parts of postcolonial South Asia, including Sri Lanka, some people know English; these speakers, although relatively few in number, are the people most likely to travel to a state in which a South Asian language unknown to them is spoken. Hence, English is de facto the current interstate and international language of South Asia, although many Indians would prefer to adopt another language, such as Hindi or a Dravidian language, as the national language. (see also Index: English language)

Burushaski.
Burushaski, spoken by some 50,000 people in far northeastern Pakistan, is without even remote known relatives. (see also Index: Burushaski language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA

LANGUAGES OF NORTH ASIA

The languages of North Asia are those spoken from the Arctic Ocean to South Asia and China and from the Caspian Sea and Ural Mountains in the west to the Pacific Ocean in the east. In genetic classification most languages of North Asia belong to the Uralic family, to one of the three families of the Altaic stock (Turkic, Mongolian, and Manchu-Tungus), or to Indo-European. The genetic affiliations of the Paleo-Siberian languages, spoken exclusively in this region, are uncertain at present. Scholars have hypothesized that some of the languages once may have been American Indian languages whose prehistoric speakers backtracked from the New World into North Asia. That preindustrial peoples traversed the Arctic waters is demonstrated by the presence of Eskimos on both the Russian and Alaskan shores of the Bering Strait. Some scholars have claimed that all languages indigenous to North Asia, except the Paleo-Siberian ones and the recently intrusive Russian language, are genetically related in a Ural-Altaic phylum. This liberal classification, however, is questioned by many scholars. (see also Index: Ural-Altaic languages)

Whether or not Altaic and Uralic are related to each other, there is no doubt that languages of both stocks share many typological features, such as vowel harmony, agglutination (a type of word formation in which word elements are added together but still retain a separate, definite meaning), and a restriction against combining a plural noun with a quantifier (as though, in English, the plural noun "girls" had to appear as a singular noun in the phrase "five girl," rather than "five girls").


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA: Uralic languages.

Uralic languages.

Languages from three branches of Uralic are spoken in North Asia; their speakers are few in number. The Yukaghir family, which may be an offshoot of Early Uralic, includes one living language, Yukaghir (spoken by a few hundred people south of the Arctic Circle on tributaries of the Kolyma River and in the tundra between the Indigirka and Alazeya rivers). An extinct Yukaghir language, Chuvan (or Chuvantsy), was spoken until the 20th century on the Anadyr River. The two Ob-Ugric languages--Mansi (also called Vogul) and Khanty (also called Ostyak)--are spoken on the Ob River and its southwestern tributaries. All the languages of the Samoyedic branch are spoken in North Asia: Nenets, speakers of which are scattered from the mouth of the Yenisey River westward to the mouth of the Northern Dvina; Enets, also centred on the Yenisey; Nganasan, spoken on the Taymyr Peninsula in Siberia; and Selkup, spoken in a region lying south of that of the Enets speakers, between the Taz and Tym rivers. Another southern Samoyedic language, Kamas (Sayan), was functionally extinct by 1987 (see Uralic languages). (see also Index: Uralic languages, Yukaghir language, Samoyedic languages)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA: Altaic languages.

Altaic languages.

Turkic languages.
The Turkic languages are remarkable for their lack of diversity in spite of their wide occurrence in all the Eurasian regions except South Asia and Southeast Asia. Several of the numerous Turkic languages might be considered as a single language if it were not for the fact that mutual intelligibility between groups is impaired by differential borrowing from the various unrelated languages encountered in different regions. Thus, a Turkish speaker from Turkey might understand or largely understand an Uzbek speaker with more ease than the Uzbek speaker would understand Turkish, which has many loanwords from Persian and Arabic. Educated speakers of Turkic languages are able to read written materials in other Turkic languages after some adjustment to their varying spelling conventions and sound correspondences. Because such differences are identified with different Turkic ethnic groups, it is customary to identify the larger of these ethnic groups as speaking different Turkic languages, although some degree of intelligibility exists between them, as it does between the Uzbek, Bashkir, and Tatar languages. In addition, some languages have dialects that are transitional between two recognized language groups; e.g., some dialects of Karakalpak are said to be transitional to Turkmen, and others are said to be transitional to Uzbek. (see also Index: Altaic languages, Turkish language, Uzbek language, Kara-Kalpak language)

In North Asia, Turkic languages are distributed from the southern extension of North Asia northeastward through central Siberia and include Turkmen in Turkmenistan, Iran, and Afghanistan; Uzbek in the Central Asian countries, primarily Uzbekistan, and in Afghanistan; Kyrgyz in Kyrgyzstan and in neighbouring areas from Afghanistan to China; Karakalpak in Karakalpakstan republic in Uzbekistan; and Kazak in Kazakstan. Six or seven Turkic groups immediately north of western Mongolia are much smaller in terms of both numbers of speakers and the region they inhabit; in northern Siberia the Yakut extend from Sakha republic (Yakutia) west to Taymyr autonomous okrug (district), where the Dolgan people speak a Yakut dialect.

Mongolian languages.
The Mongolian languages are dispersed throughout Central Asia from Afghanistan to Manchuria, occupying large parts of North and East Asia. The problem of recognizing language boundaries (i.e., of distinguishing separate languages) in the Mongolian family is complicated by the fact that differences between dialects are exaggerated in areas where Mongolian speakers have borrowed features of different unrelated languages but are minimized in areas where one dialect is spoken as a lingua franca throughout an extensive region. Among the Mongolian languages are Mogol, spoken in Afghanistan, where it has been influenced by Iranian and Turkic languages; Monguor, spoken in Kansu province of China and in Tibet, with noticeable effects of both Tibetan and Chinese in the language; and Daghur, spoken mainly in Inner Mongolia and heavily influenced by Tungus languages. Additional languages include Ordos in Inner Mongolia, Kharachin in China, Oyrat in the Sino-Russian border area from Kyrgyzstan to the Altai Mountains, and Buryat from Buryatia into Inner Mongolia. Some degree of mutual intelligibility exists between some of these, but this may be in part the result of the lingua franca use of Khalkha, the official language of Mongolia. (see also Index: Mogol language, Daghur language)

Manchu-Tungus languages.
Speakers of the Manchu-Tungus languages are scattered from central interior Siberia to the shores of the seas of Japan and Okhotsk, including the Kamchatka Peninsula and Sakhalin Island. Those not near the coast live generally along the banks of the major rivers--the Yenisey, Tunguska, Khatanga, Lena, Amur, and Sungari. Detailed information on most of the Manchu-Tungus languages is scanty, and language names usually coincide with politico-cultural groups, rather than being based on a comparison of linguistic features or knowledge of mutual intelligibility. Borrowing that resulted from contact with speakers of Samoyedic (Uralic) languages to the west and northwest, Mongolian languages and Sinitic (Chinese) languages to the south, and the various Paleo-Siberian languages to the north and east has further complicated the subclassification of the Manchu-Tungus languages by increasing the superficial differences among them.

Most speakers of Manchu-Tungus languages are bilingual in the official language of their country, and many are replacing their native languages with Russian or Chinese. After the Manchu in China, the next best known and numerous of the Manchu-Tungus peoples are the Evenk. Other groups include the Even (or Lamut), Nanai (or Gold), and other relatively small tribes. For more information on the Turkic, Mongolian, and Manchu-Tungus languages, see Altaic languages.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA: Paleo-Siberian languages.

Paleo-Siberian languages.

Most of the peoples whose languages are grouped together under the catchall category of Paleo-Siberian live in northeasternmost Siberia in the area between the East Siberian Sea and the Sea of Okhotsk, including the Kamchatka Peninsula, along the coast of the Sea of Okhotsk as far south as the Amur River, and on Sakhalin Island; peoples of another Paleo-Siberian group live far to the west along the middle and upper Yenisey River. The Paleo-Siberian languages form three groups that are not only not related to each other but also have not been demonstrated to be related to any other genetic groups. (Another group sometimes classified as Paleo-Siberian, the Yukaghir, is now considered by some linguists to be a member of the Uralic language family, perhaps an offshoot of Early Uralic.)

The northernmost and most widespread of these linguistic groups and the only one that includes more than one living language is the Luorawetlan family, which consists of Chukchi, Itelmen (Kamchadal), and Koryak. Most scholars now classify Kerek and Aliutor, once considered to be dialects of Koryak, as independent languages. (see also Index: Luorawetlan languages)

Nivkh (Gilyak), spoken on Sakhalin Island and in the coastal and inland Amur River country of the mainland, has no known linguistic relatives. Ket (or Yenisey-Ostyak) is the only language of the Yeniseian or Yenisey-Ostyak family that is still spoken. Ket speakers live along the upper and middle Yenisey River, as did the speakers of its sister languages, Kott (Cottian-Manu), which became extinct in the 19th century, and Assan (Asan) and Arin, both of which became extinct in the 18th century (see Paleo-Siberian languages). (see also Index: Nivkh language, Ket language, Arin language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA: Other languages.

Other languages.

Indo-European languages, like Russian (a Slavic language), were introduced to North Asia only relatively recently. They include the Iranian languages in the southwestern extension of North Asia (Tajik, or Western Farsi [Persian], in Tajikistan and Balochi in Turkmenistan) and the long-extinct Tocharian, which penetrated into Central Asia as far as Chinese Turkistan (see Indo-European languages: Indo-Iranian languages and Tocharian language).


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF NORTH ASIA: Writing and literacy in North Asia.

Writing and literacy in North Asia.

The earliest stimulus toward writing in North Asia was from China. The latest stimulus, from Soviet Russia, brought literacy to those Altaic peoples whose languages were unwritten in tsarist times. In contrast to the written Altaic languages, the Paleo-Siberian languages in general remain unwritten. Soviet educational policy encouraged the use of native languages for education and for teaching writing. The standard form of writing Tajik, for example, is in the Cyrillic alphabet, and knowledge of this alphabet facilitated later learning of Russian, which is the modern lingua franca of North Asia. In the Mongol empire of the 13th century, Turkic languages were used as languages of administration across North Asia from the Caspian Sea to Manchuria and, initially, in adjacent Eurasian regions conquered by Genghis Khan.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHWEST ASIA

LANGUAGES OF SOUTHWEST ASIA

Languages spoken today in the area from Iran westward to the Mediterranean (in Iran, Iraq, Saudi Arabia, Jordan, Syria, Lebanon, and Israel) are Semitic, Indo-European, or Turkic. The languages in the two marginal subareas of Southwest Asia (in Afghanistan and in Turkey and the Caucasus Mountains between the Black and Caspian seas) far exceed the languages of Europe in genetic diversity.

At one time Sumerian, which is preserved in written form, was spoken as the first language of civilization in the ancient Middle East; this language was neither Semitic nor Indo-European (see also Sumerian language). Early literary traditions and literacy for the elite began in this central area of Southwest Asia and extended from the Sumerian, Old Persian, and Akkadian literatures to Asia Minor (Hittite) in the north and to the Nile River (Egyptian) in northeastern Africa. Akkadian and Persian seem to have been the first two languages put to wide international use. (see also Index: Sumerian language, Akkadian language, Persian language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHWEST ASIA: Indo-Iranian languages.

Indo-Iranian languages.

Almost all of the score of living languages of the Iranian subgroup of the Indo-Iranian branch of Indo-European are spoken in Southwest Asia, occasionally extending into neighbouring regions. Persian has three separate literary standards that are not confined to the countries in which they centre (Iran, Afghanistan, and Tajikistan). More than half of the speakers of Pashto live in Afghanistan and the rest in South Asia. Kurdish is spoken in an area extending southward from southern Armenia into Turkey, Syria, Iran, and Iraq. Perhaps three-fifths of the speakers of Balochi live in Iran and southern Afghanistan. Several other Iranian languages (or dialects) have many fewer speakers; these include Luri and Bakhtyari, spoken only in Iran, and Munji and Shughni, spoken largely in Afghanistan, with only a few of their speakers in Pakistan or Tajikistan. One Iranian language, Yaghnabi, is spoken only in Tajikistan. Three Iranian languages are spoken almost entirely in the Caucasus: Tat, Talysh (with some speakers in Iran), and Ossetic.

The half a dozen Nuristani languages spoken in Afghanistan and part of Pakistan, sometimes classified as members of the Dardic subgroup of Indo-Aryan, more recently have been classified by some scholars as constituting a separate branch of Indo-Iranian. In addition, some Landa (Indo-Aryan) speakers also live in Afghanistan. Two very divergent dialects of another Indo-Aryan language, Romany, are spoken in Southwest Asia--Armenian Romany and Asiatic Romany (the dialect of the Palestinian Gypsies). For more information on the Iranian and Indo-Aryan languages, see Indo-Iranian languages. (see also Index: Dardic languages)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHWEST ASIA: Other languages.

Other languages.

The sole language of another branch of Indo-European, Armenian, is spoken predominantly in Armenia but also in Syria, Georgia, Russia, Azerbaijan, Iran, and other parts of Southwest Asia (see Armenian language). (see also Index: Armenian language)

The long-extinct languages of the Anatolian branch of Indo-European, including Hittite, were once spoken in Southwest Asia (see Anatolian languages).

Five Turkic languages are spoken primarily in Southwest Asia: Turkish, spoken in Turkey and surrounding countries largely to the north; Azerbaijani, spoken chiefly in Azerbaijan and Iran; and Kumyk, Karachay, and Nogay, spoken in the Caucasus. Three Turkic languages spoken predominantly in North Asia also are spoken in Central Asia: Uzbek, Turkmen, and Kyrgyz.

One language of the Mongolian family is spoken in Southwest Asia-- Mogol in Afghanistan; and Brahui, a Dravidian language, has a small fraction of its speakers in Afghanistan and Iran. (see also Index: Mogol language, Brahui language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHWEST ASIA: Caucasian languages.

Caucasian languages.

In addition to the Indo-European and Turkic languages spoken in the Caucasus, there are more than 30 languages belonging to three Caucasian language families. These may be related remotely to each other in a Caucasian phylum, in which the Northeast Caucasian family is more clearly related to the Northwest Caucasian family than the South Caucasian (Kartvelian) family is to either. Georgian, a South Caucasian language, is the most widely known of the Caucasian languages, with speakers in Georgia, Azerbaijan, and adjacent parts of Turkey and Iran; it is the only Caucasian language with a long literary tradition. Other South Caucasian languages are Laz and Svan. The Northwest Caucasian (Abkhazo-Adyghian) languages include Kabardian, Abkhaz, Abaza, Adyghian, and Ubykh (the latter now extinct). The approximately 25 languages of the Northeast Caucasian (Nakho-Dagestanian) family are spoken by people living mostly in the Dagestan republic in Russia. These languages include Chechen, Ingush, Avar, Dargin (Dargwa), Lak, Lezgian, and Tabasaran. There is some scholarly disagreement concerning the classification of the Caucasian languages (see Caucasian languages). (see also Index: Georgian language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHWEST ASIA: Semitic languages.

Semitic languages.

Five Semitic languages are still spoken in Southwest Asia: Arabic; Hebrew, primarily in Israel; dialects of East Aramaic, still spoken in Israel, Syria, Iran, Iraq, and Armenia; West Aramaic dialects, still spoken in Lebanon and Syria; and Modern South Arabic, spoken in southern Saudi Arabia and on nearby islands. Of the extinct Semitic languages, the best known are Phoenician, Akkadian (Babylonian and Assyrian), Syriac, Moabite, and Ugaritic (see Semitic languages).


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EAST ASIA

LANGUAGES OF EAST ASIA

Languages in East Asia are those traditionally spoken in China, Japan, and Korea--i.e., those that occupy the region between North Asia and Southeast Asia. A conservative genetic classification reflects immense genetic diversity for East Asia by claiming that Ainu, Japanese, and Korean are related neither to each other nor to any other language in East Asia and that the Chinese languages (or dialects) belong in one family, Miao-Yao (Hmong-Mien) languages in another, and Tai languages in still another. A liberal genetic classification leaves Ainu isolated, includes Korean and Japanese in the Altaic family, and classifies some or all of the other groups as Sino-Tibetan.

Three general types of syntax, which partly overlap the liberal genetic classification, can be distinguished among languages in East Asia. First, Ainu is isolated syntactically as well as genetically. The second type is shared by Korean and Japanese. All Chinese languages are strikingly alike in syntax, and this third type is approximated among some non-Chinese languages of the Sino-Tibetan family and among some languages of Southeast Asia whose genetic classification is tentatively indeterminate.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EAST ASIA: Altaic languages.

Altaic languages.

Languages from three of the major families of North Asia are spoken in China. Uighur, a Turkic language, is spoken in Sinkiang and Kansu provinces of China as well as in the countries of Central Asia and in southwestern Mongolia. Another Turkic language, Kyrgyz, has some speakers in China. Manchu is the best known of the Manchu-Tungus languages and that with the longest literary tradition (dating from as early as 1599). After the Manchu established the last Chinese dynasty in 1644, their language was gradually replaced in most parts of China by Mandarin--except for formal and ceremonial occasions--but it is still spoken in scattered localities in Northeast and Northwest China. (see also Index: Altaic languages, Uighur language, Manchu language)

Striking similarities in syntax have led some linguists to postulate a remote relationship between the Altaic languages and Korean and, less frequently, Japanese.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EAST ASIA: Korean, Japanese, and Ainu.

Korean, Japanese, and Ainu.

Korean is spoken in Korea as well as by sizable populations in China and Japan (see Korean language). (see also Index: Korean language)

The Japanese language family includes, besides Japanese, several mutually unintelligible dialects spoken on the Ryukyu Islands by people who are bilingual in mainland Japanese. Japanese is spoken by some 125 million people in Japan and by small groups in Brazil and the United States, especially in Hawaii (see Japanese language).

Ainu, the remaining language in insular East Asia for which not even a remote relationship with other languages seems likely, originally was spoken in Japan and on Sakhalin Island and the Kuril Islands. By the late 20th century it was virtually extinct, with only a few speakers in northern Japan. (see also Index: Ainu language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EAST ASIA: Sino-Tibetan languages.

Sino-Tibetan languages.

Chinese languages (dialects).
Most important in terms of numbers of speakers and their influence on the other languages in East and Southeast Asia are the Chinese languages (often called dialects). In terms of mutual intelligibility among adjacent dialects, there are several Chinese languages: Mandarin, Wu, Yüeh (Cantonese), Hsiang, Kan, Hakka, and Min (or North Min and South Min). Mandarin is the native language of more than 70 percent of the Chinese and is spoken as a second language by many of the native speakers of the other languages, both Chinese and non-Chinese, in China. It has traditionally been the language of administration. (see also Index: Mandarin language)

Although speakers of two different Chinese languages may not be able to understand one another when they talk, communication between them is possible in writing; conversely, the same written message is read aloud differently by speakers of different Chinese languages. The functional advantages of Chinese writing explains its perseverance for four millennia, but these advantages are partly offset by the difficulties each generation must experience in learning the thousands of character signs that are needed for literacy. Traditionally most Chinese citizens were believed to be illiterate, but, with simplified characters and romanization, the majority of the people in China are now literate. The Chinese languages are notable for their enormous numbers of speakers, and Mandarin has the largest number of speakers of any of the world's languages (some 800 million native speakers).

A remote relationship in one family (Sino-Tibetan) has been postulated for the Chinese languages and all the other non-Altaic families that have languages spoken in China. In spite of the fact that there is no doubt that all these languages bear many similarities to Chinese, current knowledge fails to reveal to what extent such similarities might be the result of borrowing rather than common origin.

Tibeto-Burman languages.
The Tibetan, or Tibetic, language group includes at least two Tibetan proper languages spoken in Tibet, Nepal, and India: Central Tibetan, including Lhasa, the standard dialect of Tibet, and Western Tibetan. In addition there are many other languages in Nepal, India, and Bangladesh that are closely related to Tibetan proper. (see also Index: Tibetan language)

More distantly related Tibeto-Burman languages are spoken in East Asia over the borders of Myanmar (Burma); these languages, often called Burmic, include dialects of the Burmese-Lolo subgroup (including Burmese) and the Kachin subgroup. For more information on the Chinese, Tibetan, and Burmic languages, see Sino-Tibetan languages.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF EAST ASIA: Tai and Miao-Yao (Hmong-Mien) languages.

Tai and Miao-Yao (Hmong-Mien) languages.

All the languages of the Kam-Sui language group, which is related to the Tai family, are spoken in China (in Kweichow, Hunan, and Kwangsi provinces), with some dialects extending into Southeast Asia. Speakers of Miao-Yao languages are scattered over south-central China and extend into Vietnam, Laos, and Thailand. Dialects of the Miao language include Red Miao, White Miao, Green or Blue Miao, and the more divergent Black Miao. The Yao languages are Yao (also called Man or Mien), Laka, and Punu.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN)

LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN)

Southeast Asia is generally taken to be a region that includes both a mainland subregion, south of China and east of India, and an insular subregion, which includes the insular half of Malaysia, all of Indonesia, and the Philippines. Virtually all the languages of insular Southeast Asia belong to a single language family-- Austronesian (Malayo-Polynesian). Mainland Southeast Asia, on the other hand, has various representatives from the Austroasiatic, Tai, and Sino-Tibetan language groups. Hence, genetic diversity is greater in mainland than in insular Southeast Asia. Austronesian languages extend out of Southeast Asia to the most distant culture areas in Oceania (Polynesia and Micronesia), where they are the only languages known aboriginally. One modern Austronesian language (Malagasy) is spoken on the African side of the Indian Ocean in Madagascar.

Curiously enough, it is in Melanesia, between the Bismarck Archipelago and Vanuatu, that the most diverse Austronesian languages are spoken today; this provides grounds for the conjecture that the Proto-Austronesian language was spoken there millennia ago and that the daughter languages diversified as their speakers migrated throughout much of the world, with Malay and Cham backtracking eventually to mainland Southeast Asia, out of which the ancestors of Proto-Austronesian speakers must have come. (see also Index: Malay language, Cham language)

In general, the name of the country and the name of the national language are the same in both insular and mainland Southeast Asia. Thus, Pilipino (based on Tagalog) is the name of one of the national languages of the Philippines, even though Pilipino is learned as a second language by most Filipinos. The fear in all of Southeast Asia of indirect neocolonial domination motivates continued distrust of the old languages of colonialism--English, French, Dutch, Spanish--and now also of Japanese and Russian. A pidgin-creole-- Neo-Melanesian, or Melanesian Pidgin English--is used as a lingua franca by speakers of Austronesian and other languages from southern Papua through Melanesia into Micronesia. (see also Index: Pilipino language, neocolonialism)

Though the languages in the mainland subregion of Southeast Asia are genetically diverse, they show widespread ranges of the same typological features--such as the use of distinctive tones and classifiers--among unrelated or only remotely related languages.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN): Austroasiatic languages.

Austroasiatic languages.

The Mon-Khmer language family includes more than 80 languages--more than any other family that is centred primarily or entirely in Southeast Asia. Mon-Khmer languages are spoken from Myanmar to Vietnam. In Cambodia, Khmer (Cambodian) is the official language; its speakers also are found in Thailand. Mon also is spoken in Thailand and Myanmar. (see also Index: Khmer language)

The language of mainland Southeast Asia with the greatest number of speakers is Vietnamese, spoken in Vietnam and by smaller numbers of speakers in Cambodia, Thailand, and Laos. Muong, spoken in the central highlands of northern Vietnam, is recognized as a separate, but related, language and shows far less Chinese influence. (see also Index: Vietnamese language, Muong language)

Classified as a northern group of the Mon-Khmer family are several languages spoken in Myanmar (east of Mandalay), northwestern Thailand, northern Laos, and to a lesser extent in northern Vietnam and in southwestern China. These include languages of the Palaungic, or Palaung-Wa, branch, spoken in Myanmar, Thailand, China, and Laos; and the Khmuic branch, spoken in Laos, Thailand, and Vietnam. (see also Index: Palaungic languages)

Another branch of Mon-Khmer, the Aslian branch, is composed of three small groups of related languages in Malaysia. They are the North Aslian, or Semang, subbranch, spoken in the inland area of northern and central Malaysia and across the border in Thailand; the Senoic, or Sakai, subbranch, with speakers south of Kuala Lumpur on the coast and inland farther south; and the Semelaic, or South Aslian, subbranch, spoken south of the Senoic languages. Data on the Nicobarese languages, spoken on the Nicobar Islands, suggest that they form a distinct branch (Nicobarese) of the Mon-Khmer family (see Austroasiatic languages). (see also Index: Semelaic languages)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN): Tai and Sino-Tibetan languages.

Tai and Sino-Tibetan languages.

At least a dozen languages of the Tai language family are spoken in Southeast Asia: Thai, or Siamese, in Thailand; Lao, in Thailand, Laos, and Cambodia; Yuan, in Thailand; Shan, in Myanmar; Black Tai (Tai Noir), in Laos and Vietnam; Khün and Khamti, in Myanmar; and White Tai (Tai Blanc), Tho (Tay), Nung, and Kelao (Ch'i-lao), all in Vietnam.

Many millions of Chinese are distributed throughout Southeast Asia; of these, more than 7 million are in Thailand, 1.7 million in Malaysia, 1 million in Vietnam, and smaller numbers in Myanmar, Cambodia, and Laos.

Of the other language groups in the Sino-Tibetan family in Southeast Asia, the Burmese-Lolo (Burmish) group has the widest distribution and the greatest number of speakers. Burmese is spoken as a second language by perhaps 90 percent of those in Myanmar who have another first or native language. The Lolo languages are spoken in Myanmar, Thailand, Laos, and Vietnam; they include Lisu, Lahu, Akha, Mung, Punoi, Pyen, and others, a few of which extend into northeastern India. Karen languages are spoken in Myanmar and Thailand and include Sgaw, Pho, Pa-o (or Taungthu), and Palaychi. Most of the languages of the Kuki-Chin (Kukish) group are spoken in Myanmar. Kachin languages also are spoken in Myanmar (see Sino-Tibetan languages and Tai languages). (see also Index: Burmese language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN)

Andamanese.

Andamanese, consisting of the languages spoken on the Andaman Islands, may be a language isolate, but it is believed to be remotely related to the Papuan languages. (see also Index: Andamanese language)

Austronesian languages.

There are perhaps 500 languages in the Austronesian (Malayo-Polynesian) family, spoken in Malaysia and the Indonesian archipelago; the Philippines; parts of Vietnam, Cambodia, and Taiwan; on the main island groups of the South and Central Pacific; on New Guinea; and on Madagascar. According to one classification, these languages include, in addition to small subgroups, at least two large subgroups: Western Austronesian (or Indonesian) and Eastern Austronesian (often called Oceanic), which includes the Polynesian languages and some of the Melanesian and Micronesian languages. Those Austronesian languages spoken on the Southeast Asian mainland (Malay in Malaysia, Cham and eight other languages mostly in Vietnam, with speakers of some of them also in Cambodia) belong to a Western Indonesian subgroup, which includes Javanese, Sundanese, and Malay, including Bahasa Indonesia, the national language of Indonesia. Closely related to the Western Indonesian subgroup is the subgroup consisting of about 100 languages of the Philippines and a few languages of northern Borneo and northern Celebes (including Tagalog, Cebuano, Hiligaynon, and Ilocano). Classed with the West Indonesian and Philippine languages are a small group of languages of Celebes (e.g., Buginese and Makasarese), a few languages of Borneo, and Malagasy (used on Madagascar). (see also Index: Oceanic languages)

The languages of Polynesia, including Maori in New Zealand, Tongan, Tahitian, and Hawaiian, form a subgroup that is part of a larger Eastern Oceanic subgroup of more than 100 languages, which includes besides the Polynesian languages such languages as Fijian and a number of languages of the Solomon Islands. At least seven of the languages of Micronesia (including Gilbertese, Chuukese, and Pohnpeian) form another subgroup.

More than 100 Austronesian languages are spoken throughout New Guinea, and more than 100 Austronesian languages, not counted as Eastern Oceanic, are spoken on smaller islands of Melanesia. Those few with as many as 10,000 speakers are all used as lingua francas in wider areas than those of their native speakers (Dobu in the D'Entrecasteaux Islands, Banoni in southwestern Bougainville, Panayati in the Louisiade Archipelago). Among the Austronesian languages still spoken on Taiwan are Ami, Atayalic, Paiwan, and Bunan. There is some scholarly disagreement concerning the classification of the Austronesian languages (see Austronesian languages).


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF SOUTHEAST ASIA (INCLUDING AUSTRONESIAN): Other languages.

Other languages.

Southeast Asia also has a significant immigrant population that includes speakers of Dravidian languages (Tamil, Telugu, Malayalam, and others) and speakers of Indo-European languages (Punjabi, Bengali, Pashto, and Sinhalese). (See below Dravidian languages and Indo-European languages.)


Languages of the World: INTRODUCTORY SURVEY: NON-AUSTRONESIAN LANGUAGES OF OCEANIA

NON-AUSTRONESIAN LANGUAGES OF OCEANIA

In effect, the non-Austronesian language areas of New Guinea and Australia together constitute a wedge in the midst of three Austronesian areas: Polynesia to the east, Micronesia to the north, and Indonesia to the west. A few non-Austronesian languages are found on the Indonesian islands nearest to New Guinea (on Halmahera as well as on Timor and Alor).

An exceptionally liberal genetic classification claims that the many non-Austronesian languages in Melanesia and the few in Indonesia all belong to one phylum. Conservative classifications recognize several or even many different language families and avoid the older name for them (Papuan), because it might suggest either that the unrelated families of non-Austronesian languages are branches of one Papuan family or else that non-Austronesian languages are found only on the island of New Guinea. On the other hand, no classification is challenged when it is said that all Australian languages are ultimately related and additionally that they are related neither to Austronesian nor to non-Austronesian languages outside Australia. (see also Index: Papuan languages)

In Melanesia, which essentially constitutes the non-Austronesian world beyond Indonesia, there is much contact between Austronesian and non-Austronesian languages. Many of the Melanesian societies are multilingual, especially those in New Guinea; in addition to their native language, speakers often learn a few secondary languages--those of their immediate neighbours or, most frequently, Neo-Melanesian (a pidgin-creole with an English-based lexicon) or both.

In part of Papua New Guinea, Police (or Hiri) Motu, a pidgin based on an Austronesian language, is used as a lingua franca far beyond the territory of the few thousand native speakers of Motu. In Australia the same interest in mastering a multiplicity of languages is widespread, and Aborigines have developed another English-based pidgin-creole, quite different from Neo-Melanesian. Another parallel between Australian languages and the non-Austronesian languages north of Torres Strait is the disinclination of both to recognize or develop any one dialect of a language as a standard. (see also Index: Police Motu language)


Languages of the World: INTRODUCTORY SURVEY: NON-AUSTRONESIAN LANGUAGES OF OCEANIA: Papuan languages.

Papuan languages.

About 740 Papuan or non-Austronesian languages extend from the Santa Cruz Islands north and west into the Solomon Islands and the Bismarck Archipelago, across New Guinea to Halmahera, Timor, and Alor. Until the late 1950s all discussions of the languages of New Guinea that treated more than small, closely related groups of languages stressed the fact that the hundreds of languages spoken in a comparatively small area seemed to be completely unrelated to each other except for a few groups of immediate neighbours. Until then, little was known about more than a few of the languages of New Guinea. This situation was changed in the 1960s, with the publication of further survey work in the Highlands region of Papua New Guinea, which stated explicit relationships among a large group of languages. This group was classified the Central New Guinea macrophylum (use of the term macrophylum indicates that the languages are less closely related than those of a language family or stock).

There remain a number of families and isolated languages that seem not to be related to other Papuan languages. A liberal classification presented by the American linguist Joseph Greenberg in 1971, however, treats all the Papuan languages as genetically related in an Indo-Pacific phylum, which also includes Andamanese. Most Papuan languages are spoken by only a few hundred to a few thousand speakers (see also Papuan languages).


Languages of the World: INTRODUCTORY SURVEY: NON-AUSTRONESIAN LANGUAGES OF OCEANIA: Australian Aboriginal languages.

Australian Aboriginal languages.

All the Aboriginal languages of Australia are remotely related to each other. A few dozen of the 260 or so Australian languages still spoken account for 90 percent of the total number of speakers. Scores of languages are effectively, if not actually, extinct. The greatest diversity among the languages is found in extreme northern and northwestern Australia (Arnhem Land and Kimberley district); a single remaining family (Pama-Nyungan), with 177 languages, is distributed over the rest of Australia (see Australian Aboriginal languages).

In grammatical typology the non-Austronesian languages north of Torres Strait are heterogeneous, while the Australian languages are syntactically homogeneous and almost identical in patterns of sound combinations. Both Australian languages and non-Austronesian languages have dialects that are linked in a chain such that speakers at either end do not understand the vocabulary of speakers at the other end, although speakers of adjacent dialects can understand each other.

The available data on the two or more languages that were spoken on Tasmania until the later part of the 19th century show a typical Australian sound system, but they have not been linked convincingly to the Australian languages.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF AFRICA

LANGUAGES OF AFRICA

Languages that came into Africa from another homeland include, among others, all the European languages associated with 19th-century colonialism. Although the majority of countries in Africa became independent in the 1960s, they continued to use the European languages introduced during the colonial period alongside the numerous languages indigenous to Africa.

Languages from Southwest Asia preceded the languages of European colonization: migrations of peoples to North Africa brought the Ethiopians almost three millennia ago and the Arabic speakers many centuries ago. The Phoenician circumnavigation of Africa in ancient times left traces--Phoenician coins--on the coasts but none in the interior, and long ago migrants from Indonesia reached Madagascar, 250 miles off the African coast. Before and during the colonial period, Arab and Indian traders reached East Africa, where today a few Indo-Aryan languages are spoken among Asians.

The interior of Africa was not known to any non-Africans before the colonial period, but its prehistory can now be partially reconstructed. For example, there is evidence that the homeland of the protolanguage of the numerous Bantu languages was in Cameroon or an adjacent area in West Africa (or in both areas); that a prehistoric migration brought the Bantu speakers to Central and East Africa; and that the movements of these Bantu speakers forced the speakers of San and Khoisan languages to leave their homeland around Lake Victoria and move south to the Kalahari.

In all the postcolonial nations today, either English or Arabic or French serves both as an international language and as a functioning national language. The question still unresolved for many African nations concerns which of their indigenous languages to develop through writing and to standardize as the official language or languages of education and of the political state. The numerous pidgin-creoles, as Krio, are recent and colonial in inspiration; Sango in the Central African Republic is surely indigenous but not so surely a pidgin-creole. Most of the dozen or so languages used in trade, such as Swahili in East Africa and Hausa in West Africa, tend to have great changes in vocabulary like pidgin-creoles, but they are not classified as pidgin-creoles; instead they are varieties of normal languages that function as lingua francas. Lingua francas of one sort or another are a prerequisite for the markets found throughout rural Africa. (see also Index: Swahili language, Hausa language)

Despite the genetic diversity of the languages of South Africa and the even greater diversity in West Africa, a part of each of these subregions can be shown, on the basis of typology, to be a linguistic area. Thus, most linguists have found that most languages in West Africa distinguish vocabulary items and word elements by tone; in South Africa the clicks characteristic of Khoisan languages also are found among neighbouring Bantu languages such as Xhosa and Zulu. The early use of typology to anticipate genetic classification, however, led to the claim that Africa was full of mixed languages--e.g., Mbugu in Tanzania. But Mbugu, despite having borrowed Bantu prefixes and culture words from Bantu, can be shown to have a single line of origin--to have descended from a single protolanguage (Proto-Cushitic)--on the basis of its grammatical constituents (in particular its pronouns and verb forms) and basic vocabulary items that are cognate with other Cushitic languages. (see also Index: Mbugu language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF AFRICA: Afro-Asiatic (Hamito-Semitic) languages.

Afro-Asiatic (Hamito-Semitic) languages.

The Hamito-Semitic language family (considered a phylum by some) includes five branches spoken across North Africa from Mauritania to Somalia and beyond into Southwest Asia: Chadic, Semitic, Cushitic, Berber, and the now extinct Egyptian-Coptic. The Chadic branch consists of more than 100 languages spoken in Nigeria, Niger, Cameroon, Ghana, Chad, and the Central African Republic. By far the most widespread is Hausa, estimated to be spoken by as many as 35,000,000 people, for about a third of whom it is a second language. (see also Index: Chadic languages)

Five Semitic languages are spoken in Africa, if modern colloquial Arabic is counted as a single language throughout its range across North Africa and the Arabian Peninsula and if Gurage in Ethiopia also is counted as a single language. The Semitic languages in Ethiopia include Amharic, Tigrinya, and Gurage (but the people grouped as Gurage may be speaking several separate languages). Tigré and Tigrinya are spoken in Eritrea.

Cushitic languages are spoken in Eritrea, Ethiopia, Somalia, The Sudan, Tanzania, and Kenya. The languages with the greatest number of speakers are Gallinya, Somali, Sidamo, Hadya, and Afar-Saho. Some scholars consider a group of languages traditionally classified as Cushitic to be a separate branch of Hamito-Semitic, called Omotic. Spoken in Ethiopia, they include Walamo, with far more speakers than the other Omotic languages, Ari, Shako, Zaysse, and others with only a few thousand or a few hundred speakers.

The languages of the Berber branch are spoken from the western desert of Egypt west to the Atlantic and extend to Senegal on the coast and to northern Nigeria in the interior. Guanche, an extinct language that may have been an offshoot of Berber, was formerly spoken on the Canary Islands. Berber languages include Shluh, spoken in Morocco; Tamashek (Tuareg) in Algeria, Libya, Niger, and Mali; and Tamazight in Morocco and Algeria (see below Hamito-Semitic languages).


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF AFRICA: Nilo-Saharan languages.

Nilo-Saharan languages.

The Nilo-Saharan languages in central interior Africa include the Chari-Nile languages and others that are not closely related to each other or to the Chari-Nile group. (The validity of this grouping has been questioned.) The largest Chari-Nile division, Eastern Sudanic, includes more than 60 languages spoken from Chad to Kenya and Tanzania; it includes a group of languages often classified as a separate family or branch (Nilo-Hamitic), which appears in some classifications as a branch of the Hamito-Semitic family rather than the Nilo-Saharan.

Among the major Eastern Sudanic languages are Teso in Uganda and Kenya, Dinka in The Sudan, Luo in Kenya and Tanzania, and Lango in Uganda. Only three of the 30 or so languages of the Central Sudanic subgroup of Chari-Nile are spoken by groups of some 100,000 people: Sara in Central African Republic and Chad, Lugbara in Uganda and Zaire, and Mangbetu in Zaire.

Among the Nilo-Saharan languages that are not classified as Chari-Nile is the Saharan group. Kanuri, its largest member, is spoken by several million people in Nigeria, Niger, Cameroon, and Chad. In the Maba group, Masalit is spoken in The Sudan. Songhai, often classified as a language isolate, is spoken by about a million people in Niger, Mali, and Burkina Faso. Fur, also sometimes considered to be an isolate, is spoken mostly in The Sudan. (see also Index: Saharan languages, Maba languages, Songhai language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF AFRICA: Niger-Congo languages.

Niger-Congo languages.

Languages in the Niger-Congo (or Niger-Kordofanian) family are spoken all across Africa from Mauritania to Kenya and south into South Africa. There are almost 900 Niger-Congo languages, which have been classified into six genetic subgroups. The Bantu (Bantoid) languages of the Benue-Congo subgroup far outnumber those of any other family in Africa, both in terms of number of languages and in terms of total number of speakers. At least 15 Bantu languages are each spoken by more than 3,000,000 people; the following each have more than 5,000,000 speakers: Rwanda, Shona, Kongo, Luba-Lulua (Luba-Kasai), Xhosa, and Zulu.

Other subgroups in the Niger-Congo family include only a few dozen languages, as those in the Mande subgroup in West Africa, which are spoken from Mauritania to Ghana (including Bambara, Mende, and Vai). The Gur (Voltaic) languages, spoken from Mali and Côte d'Ivoire to Nigeria, include Mossi, with some 4,000,000 speakers, and numerous other languages with significantly fewer speakers. The West Atlantic languages, spoken from Senegal to Nigeria, include Fulani, Wolof, Temne, and several other languages of less numerical import. Of the languages of the Adamawa-Eastern subgroup, spoken from The Sudan to Cameroon, only Sango, through its use as a lingua franca, may be known by more than 1,000,000 people. The Kwa subgroup of Niger-Congo includes Twi (Akan), Yoruba (in Nigeria and Benin), and Igbo (also known as Ibo; in Nigeria). Some scholars link the Kordofanian languages of North and South Kurdufan provinces in The Sudan with the Niger-Congo languages in a Niger-Kordofanian phylum. (see also Index: Mande languages, Voltaic languages, Kwa languages)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF AFRICA: Khoisan languages.

Khoisan languages.

The Khoisan family consists of about four dozen languages spoken in southern Africa and two click languages (Sandawe and Hadza) spoken in Tanzania that are not closely affiliated with any one group in the Khoisan family. Uncertainties in the number of languages and the number of language groups arise from the profusion of labels for various groups and the lack of detailed linguistic comparisons among large numbers of them. Most of the Khoisan languages have been considered to be on the verge of extinction, if not known to be already extinct, but recent estimates of the numbers of peoples grouped on the basis of their culture (Khoikhoin and San) show many thousands of speakers. The Khoisan language estimated to have the most speakers is Nama. For more information on the Nilo-Saharan (Chari-Nile), Niger-Congo, and Khoisan languages, see below African languages. (see also Index: Nama language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF THE AMERICAS

LANGUAGES OF THE AMERICAS

Languages indigenous to the Americas were brought from Asia by the forebears of modern American Indians (including Eskimos), who left Asia after the dog was domesticated but before other animals were domesticated. Something is known about the culture of these peoples but nothing about their languages, which are known only after contact with European languages.

Today there are six European languages in the Americas that serve as languages of both education and government administration. (Several Indian languages, however, function in this dual role--Guaraní of Paraguay, Greenlandic of Greenland, and Quechua and Aymara of Peru.) These official languages and their number of primary political divisions are Spanish (18); Portuguese (1); Dutch (2)--1 in Latin America and 1 in the Caribbean; English (2 in North America and 11 in the Caribbean); French (1 in North America and 3 in the Caribbean); and Danish (1 in Greenland). Before the colonial period in Latin America and during the first century or two of that period, the following American Indian languages could also be classed as official or semiofficial: Nahuatl (Nahua), the language of the Aztec in Mexico and Central America; Chibcha-Muisca in Colombia; Quechua, the language of the Inca, in the Andean area; Tupí in Brazil; and Guaraní in and around Paraguay. In addition to American Indian languages, two pidgin-creole languages are official in their own political divisions, Sranan (Taki-Taki) in Suriname and Papiamento in Curaçao. Other pidgin-creoles in the Caribbean, such as Haitian Creole, are being increasingly written.

Genetic diversity among languages of continental-sized areas can be expressed in terms of the number of minimum genetic classes taken as the usual basis for discussion by specialists of that area. Research may lead to a downward (or upward) revision, and a new number of minimum genetic classes is used as a basis for further discussion. For North America (north of Mexico) and for the 20th century, the basis for discussion has shifted three times so far: from about 50 families in the classification of the U.S. scholar J.W. Powell to six phyla in the classification of the U.S. anthropological linguist Edward Sapir, which was revised at the 1964 Conference on North American Indian Languages by splitting and reclassification (e.g., of Sapir's Hokan-Siouan) and by merging (e.g., the Muskogean family and a few isolates were added to Algonquian [Algonkian] in the Macro-Algonquian phylum). This third classification is summarized below. Proposals for a minimum number of genetic classes in South America range from more than 100 families to three phyla (in a recent liberal classification).

The Plains Indian sign language (hand talking) is still known, but Chinook Jargon and other pidgin-creoles in North America fell into disuse as soon as American Indians became bilingual in English, French, or Spanish.


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF THE AMERICAS: North and Central American Indian languages.

North and Central American Indian languages.

For North America north of Mexico, the summary of culture areas (before any American Indians were relocated by Europeans) by the U.S. anthropologist Harold E. Driver is a convenient basis on which to superimpose the various ways in which language classifications (genetic and typological) combine with cultures that are ecologically adapted to each of ten areas--the Arctic, Subarctic, Northwest Coast, Plateau, Plains, Prairies, East, California, Great Basin, and Southwest. The three variables (genetic, typological, and cultural) coincide approximately in the Arctic (the one language family, Eskimo-Aleut, does not include typologically diverse languages, but it does spread over a culture area that is not entirely homogeneous). In the Subarctic two language families are represented, Algonkian and Athabascan, which are distinct typologically as well as genetically. Northwest Coast and adjacent Plateau languages are genetically very diversified but surprisingly homogeneous in a diffusional kind of phonological typology. The languages in the treeless Plains and the midwestern Prairies are genetically and typologically diverse; all the language families represented, except Caddoan, are intrusive in the sense that their homelands lie outside the Plains and Prairie areas.

Language families in the East give an impression of a little typological similarity combined with considerable genetic diversity. On the opposite coast, California is surprisingly homogeneous in culture and in language typology but heterogeneous in genetic classification of languages. There are few languages and only two language families represented in the Great Basin, which is homogeneous in all respects. The adjacent Southwest is anomalous in all three variables considered here. Where it is culturally homogeneous, as between Pueblo societies, it is genetically and typologically diverse in language: four different language families are represented in Pueblo societies. Non-Pueblo societies of the Southwest are diverse culturally as well as linguistically. (see also Index: Pueblo Indians)

Eskimo-Aleut.
The three languages of the Eskimo-Aleut family are still spoken in their prediscovery areas from Greenland to Siberia and also on Komandor Island between the Aleutians and Kamchatka (see below Languages of the Americas: Eskimo-Aleut languages).

Athabascan.
About 20 languages of the Athabascan family are still spoken in four different culture areas: the Yukon and Mackenzie areas of the Subarctic (the centre of Athabascan diversity, with 17 living languages, including Chipewyan-Slave-Yellowknife and Carrier), the Northwest Coast (where only Hupa, Tolowa, and Chasta Costa may still be spoken), the Southwest (where the Navajo dialect of what may be considered a single Southwestern Apachean language has more speakers than any other Indian language north of Mexico), and the Plains (where two Athabascan languages are more recently intrusive--Sarcee [Sarsi] from the Subarctic and Kiowa Apache from the Southwest). Three language isolates spoken in the Northwest Coast (Eyak, Tlingit, and Haida) are remotely related to Athabascan in the Na-Dené phylum, but Eyak is so much more closely related to the Athabascan family that it might be considered a divergent member of the family. (see also Index: Na-Dené languages)

Algonkian.
The Algonkian family includes 13 languages still spoken, which belonged in the culture areas of the eastern Subarctic (e.g., Cree, Ojibwa, Micmac, Malecite), the Prairies (e.g., Fox, Potawatomi), the Plains (e.g., Blackfoot, Cheyenne, Arapaho), and the East (where most Algonkian languages became or are now becoming extinct, with only the removed Shawnee and Delaware surviving in Oklahoma). Remotely related to the Algonkian languages in a Macro-Algonkian phylum are languages spoken further to the south in the East -- the Muskogean family (including Choctaw-Chickasaw and Creek-Seminole) and several language isolates that are no longer spoken, as well as two almost extinct languages of the Northwest Coast that are more closely related to Algonkian (Wiyot and Yurok). (see also Index: Muskogean languages)

Macro-Siouan.
The Macro-Siouan phylum is named for its most extensive component, the Siouan family, the extant languages of which belong in the Plains and Prairies, including Dakota, Crow, Winnebago, and Omaha-Osage. (The Siouan languages of the East, such as Ofo and Biloxi, are no longer spoken.) Less widely distributed than Siouan is the Iroquoian family (six languages, largely of the East, including Cherokee and Mohawk), the Caddoan family (Caddo in the East, Wichita and Pawnee in the Prairies), and two language isolates of the East (Catawba and Yuchi), more closely related to Siouan than to the other families in Macro-Siouan.

Hokan.
The Hokan phylum includes several small families and a number of language isolates scattered from the Northwest Coast through California, with extensions into the Great Basin and the Southwest, and as far south as Meso-America. Hokan languages spoken by the greatest numbers of speakers include those in two families in Mexico, the Tlapanecan and Tequistlatecan, and in the Yuman family in Arizona and California.

Penutian.
The Penutian phylum is the only group of languages in North America for which relationships with languages in South America have been traced convincingly. The Penutian languages are thus distributed from the Northwest Coast and Plateau areas through California (with a possible extension into the Southwest) and Meso-America into Bolivia, Chile, and Argentina. Many of the Penutian languages north of Mexico are either no longer spoken or are spoken by fewer than 50 people. In Meso-America, however, many native languages have a considerable number of speakers; e.g., Mixe, in the Zoque family, has over 77,000 speakers, and the Mayan family includes some languages with several hundred thousand speakers, as Maya, Quiché, Kekchí, Cakchiquel, and Mam.

Aztec-Tanoan.
The Aztec-Tanoan phylum consists of two families: the Tanoan (Kiowa-Tanoan) family with three languages in the Southwest, including those spoken by the Taos and the Santa Clara, and one language in the Plains (Kiowa); and the Uto-Aztecan family, with about a score of languages spoken from the Plateau and California into Meso-America, with relatively late extensions into the Plains. California Uto-Aztecan languages include Cahuilla and Luiseño; Great Basin languages include Paiute and Shoshoni, with the Ute and Comanche dialects in the Plains; Southwestern languages include Hopi and Pima-Papago; Meso-American languages include Nahuatl, the language of the descendants of the Aztecs. The million speakers of the several varieties of Nahuatl far outnumber the total number of speakers of all the other Uto-Aztecan languages.

Oto-Manguean.
Languages of one North American phylum are located entirely in Meso-America -- the Oto-Manguean phylum, consisting of five small families. The languages with the largest number of speakers are Otomí, Mixtec, and Zapotec.

Unaffiliated languages.
In North America one large family (the Salish family in the Northwest Coast and Plateau) and several smaller families and language isolates (as the Wakashan family in the Northwest Coast and Tarascan in Meso-America) remain undetermined in phylum affiliation. Remote relationships that have been proposed for some of these are in conflict with other proposed relationships, with no overwhelming evidence presented for any one of the proposals. (see also Index: Tarasco language)


Languages of the World: INTRODUCTORY SURVEY: LANGUAGES OF THE AMERICAS: South American Indian languages.

South American Indian languages.

Language names for South America are much more numerous than those for North America, but information on actual languages is generally sporadic and often lacking entirely. Even when the list of names is reduced to 350 for languages said to be still spoken, the data to which the names refer consist, for the most part, of brief word lists; or nothing more may be known than the fact that a tribe X is said to speak differently from a tribe Y. Though it is possible to know that certain languages are probably closely related, it is not always possible to say how closely; i.e., whether they might be dialects of, or occasionally just different names for, the same language. At the opposite extreme of genetic relationship, it is clear that there are large groups of remotely related languages, but the paucity of data makes possible conflicting proposals. For at least one group of languages, those of the high cultures of South America--the Inca and the Aymara--and some of their neighbours, the problem of establishing genetic relationship is complicated by the problem of sorting out borrowings among them.

The Andean-Equatorial phylum includes the greatest number of non-extinct languages (almost 200) and the three South American Indian languages with the greatest number of speakers (Quechua, Guaraní, and Aymara). The living Andean-Equatorial languages constitute some 14 families and several language isolates. The Arawakan family includes the largest number of languages--some 100--and has the widest distribution: across northern South America from French Guiana to Colombia and southward as far as Paraguay; formerly, Arawakan languages also were spoken in Central America and the islands of the Caribbean. Most Arawakan languages are spoken by not more than a few hundred people. More than two dozen languages of the Tupian family are still spoken over a large part of South America, principally south of the Amazon River. Tupian languages include Guaraní (Tupí-Guaraní), which is spoken in a number of dialects by about 4,000,000 people in Paraguay, Brazil, Argentina, and Bolivia. Quechua, of the Quechumaran group, is spoken by some 8,000,000 people in Peru, Ecuador, Colombia, Bolivia, Argentina, and Chile. Some Quechua dialects are so divergent that they might be regarded as separate languages. The other Quechumaran language group, Aymaran, is spoken by more than 1,000,000 people in Peru and Bolivia. Most other languages in the Andean-Equatorial phylum are spoken by only a few thousand persons. (see also Index: Guaraní language, Aymaran languages)

The Ge-Pano-Carib phylum includes almost as many languages still spoken as the languages of the Andean-Equatorial phylum, but the former are all spoken by relatively small tribes, so that the total number of speakers of these languages is only a small fraction of the number of speakers of Andean-Equatorial languages. In terms of numbers of languages, the largest family in the Ge-Pano-Carib phylum is the Cariban (Carib) family, with some 60 languages still spoken in Venezuela, French Guiana, Guyana, Suriname, Brazil, and Colombia. Cariban languages were also formerly spoken in the Caribbean islands. Most Cariban languages have fewer than 1,000 speakers. The other large family in the phylum, the Macro-Ge family, includes more than 25 languages in Brazil. (see also Index: Macro-Ge languages)

The languages of the Macro-Chibchan phylum, of which some 39 may still be spoken, are distributed from Guatemala and Honduras southward into, and possibly beyond, Peru. The largest component of the phylum is the Chibchan family, of which 16 languages are still spoken from Nicaragua to northwestern Colombia--these include Cuna, spoken on the San Blas Archipelago of Panama as well as on the mainland of Panama and Colombia; Guaymí in Panama; and Páez in Colombia. (see also Index: Chibchan languages)

For further information on the Indian languages of the Americas, see below Languages of the Americas: North American Indian languages; Meso-American Indian languages; South American Indian languages.

For information on numbers of speakers by country, see the Britannica World Data: Language section in the BRITANNICA BOOK OF THE YEAR. (C.F.V. /F.M.V./Ed.)


Languages of the World: INDO-EUROPEAN LANGUAGES

INDO-EUROPEAN LANGUAGES

[Image] Indo-European is the name of a family of languages that by 1000 BC were spoken over most of Europe and in much of Southwest and South Asia; since the second half of the 15th century the Indo-European languages have spread to most other inhabited parts of the world. The term Indo-Hittite is used by scholars who believe that Hittite and the other Anatolian languages are not just one branch of Indo-European but rather a branch coordinate with all the rest put together; thus, Indo-Hittite has been used for a family consisting of Indo-European proper plus Anatolian. As long as this view is neither definitively proved nor disproved, it is convenient to keep the traditional use of the term Indo-European.

Overview of the language family

LANGUAGES OF THE FAMILY

The well-attested languages of the Indo-European family fall fairly neatly into the 10 main branches listed below; these are arranged according to the age of their oldest sizable texts.

Anatolian.
Now extinct, Anatolian was spoken during the 1st and 2nd millennia BC in what is presently Asian Turkey and northern Syria. By far the best-known of its members is Hittite, the official language of the Hittite empire, which flourished in the 2nd millennium. Very few Hittite texts were known before 1906, and their interpretation as Indo-European was not generally accepted until after 1915; the integration of Hittite data into Indo-European comparative grammar has, therefore, been one of the principal developments of Indo-European studies in the 20th century. The oldest Hittite texts date from the 17th century BC, the latest from approximately 1200 BC. For more information, see Anatolian languages. (see also Index: Anatolian languages, Hittite language)

Indo-Iranian.
Indo-Iranian comprises two main subbranches, Indo-Aryan (Indic) and Iranian. Indo-Aryan languages have been spoken in what is now northern and central India and Pakistan since before 1000 BC. Aside from a very poorly known dialect spoken in or near northern Iraq during the 2nd millennium BC, the oldest record of an Indo-Aryan language is the Vedic Sanskrit of the Rigveda (Rgveda), the oldest of the sacred scriptures of India, dating roughly from 1000 BC. Examples of modern Indo-Aryan languages are Hindi, Bengali, Sinhalese (spoken in Sri Lanka), and the many dialects of Romany, the language of the Gypsies (Rom). (see also Index: Iranian languages)

Iranian languages were spoken in the 1st millennium BC in present-day Iran and Afghanistan and also in the steppes to the north, from modern Hungary to East (Chinese) Turkistan. The only well-known ancient varieties of Iranian languages are Avestan, the sacred language of the Zoroastrians (Parsis), and Old Persian, the official language of Darius I (ruled 522-486 BC) and Xerxes I (486-465 BC) and their successors. Among the modern Iranian languages are Persian (Farsi), Pashto (Afghan), Kurdish, and Ossetic. For more information, see Indo-Iranian languages. (see also Index: Avestan language)

Greek.
Greek, despite its numerous dialects, has been a single language throughout its history. It has been spoken in Greece since at least 1600 BC, and, in all probability, since the end of the 3rd millennium. The earliest texts are the Linear B tablets, some of which may date from as far back as 1400 BC (the date is disputed), and some of which certainly date to 1200 BC. This material, very sparse and difficult to interpret, was not identified as Greek until 1952. The Homeric epics--the Iliad and the Odyssey--probably dating from the 8th century BC, are the oldest texts of any bulk. For more information, see Greek language. (see also Index: Greek language)

Italic.
The principal language of the Italic group is Latin, originally the speech of the city of Rome and the ancestor of the modern Romance languages: Italian, Romanian, Spanish, Portuguese, French, and so on. The earliest Latin inscriptions apparently date from the 6th century BC, with literature beginning in the 3rd century. Scholars are not in agreement as to how many other ancient languages of Italy and Sicily belong in the same branch as Latin. For more information on Latin, the languages derived from it, and the other languages that belong to or are sometimes included in the Italic branch of Indo-European, see Italic languages and Romance languages. (see also Index: Latin language)

Germanic.
In the middle of the 1st millennium BC, Germanic tribes lived in southern Scandinavia and northern Germany. Their expansions and migrations from the 2nd century BC onward are largely recorded in history. The oldest Germanic language of which much is known is the Gothic of the 4th century AD. Other languages include English, German, Dutch, Danish, Swedish, Norwegian, and Icelandic. For more information, see Germanic languages and English language.

Armenian.
Armenian, like Greek, is a single language. Speakers of Armenian are recorded as being in what now constitutes eastern Turkey and Armenia as early as the 6th century BC, but the oldest Armenian texts date from the 5th century AD. For more information, see Armenian language. (see also Index: Armenian language)

Tocharian.
The Tocharian languages, now extinct, were spoken in the Tarim Basin (in present-day northwestern China) during the 1st millennium AD. Two distinct languages are known, labeled A (East Tocharian, or Turfanian) and B (West Tocharian, or Kuchean). One group of travel permits for caravans can be dated to the early 7th century, and it appears that other texts date from the same or from neighbouring centuries. These languages became known to scholars only in the first decade of the 20th century; they have been less important for Indo-European studies than has Hittite, partly because their testimony about the Indo-European parent language is obscured by 2,000 more years of change and partly because Tocharian testimony fits fairly well with that of the previously known non-Anatolian languages. For more information, see Tocharian languages.

Celtic.
Celtic languages were spoken in the last centuries before the Christian era over a wide area of Europe, from Spain and Britain to the Balkans, with one group (the Galatians) even in Asia Minor. Very little of the Celtic of that time and the ensuing centuries has survived, and this branch is known almost entirely from the Insular Celtic languages--Irish, Welsh, and others--spoken in and near the British Isles, as recorded from the 8th century AD onward. For further information, see Celtic languages.

Balto-Slavic.
The grouping of Baltic and Slavic into a single branch is somewhat controversial, but the exclusively shared features outweigh the divergences. At the beginning of the Christian Era, Baltic and Slavic tribes occupied a large area of eastern Europe, east of the Germanic tribes and north of the Iranians, including much of present-day Poland and what was formerly the western Soviet Union--namely, Belarus, Ukraine, and westernmost Russia. The Slavic area was in all likelihood relatively small, perhaps centred in what is now southern Poland. But in the 5th century AD the Slavs began expanding in all directions. By the end of the 20th century the Slavic languages were spoken throughout much of eastern Europe and northern Asia. The Baltic-speaking area, however, contracted, and by the end of the 20th century Baltic languages were confined to Lithuania and Latvia.

The earliest Slavic texts, written in a dialect called Old Church Slavonic, date from the 9th century AD; the oldest substantial material in Baltic dates to the end of the 14th century, and the oldest connected texts to the 16th century. For more information, see Baltic languages and Slavic languages.

Albanian.
Albanian, the language of the present-day republic of Albania, is known from the 15th century AD. It presumably continues one of the very poorly attested ancient Indo-European languages of the Balkan Peninsula, but which one is not clear. For more information, see Albanian language. (see also Index: Albanian language)

In addition to the principal branches just listed, there are several poorly documented extinct languages of which enough is known to be sure that they were Indo-European and that they did not belong in any of the groups enumerated above (e.g., Phrygian, Macedonian). Of a few, too little is known to be sure whether they were Indo-European or not (e.g., Ligurian).

Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: ESTABLISHMENT OF THE FAMILY

ESTABLISHMENT OF THE FAMILY

Shared characteristics.
The chief reason for grouping the Indo-European languages together is that they share a number of items of basic vocabulary, including grammatical affixes, whose shapes in the different languages can be related to one another by statable phonetic rules. Especially important are the shared patterns of alternation of sounds. Thus the agreement of Sanskrit ás-ti, Latin es-t, and Gothic is-t, all meaning 'is,' is greatly strengthened by the identical reduction of the root to s- in the plural in all three languages: Sanskrit s-ánti, Latin s-unt, Gothic s-ind 'they are.' Agreements in pure structure, totally divorced from phonetic substance, are, at best, of dubious value in proving membership in the Indo-European family.

Table 1 gives examples of typical vocabulary items widely shared within the Indo-European family that have been decisive in establishing the family. A blank indicates that the language in question does not use the item in accordance with the given meaning or that its word for that meaning is unknown.

Similarities in grammatical endings are shown in Table 2 by samples of noun declension and verb inflection in some of the more archaic languages that have retained the inflectional endings of Indo-European in relatively unchanged form. Note that Old Lithuanian -i and -u were nasalized vowels, representing a continuation from the earlier forms *-in and *-un. (The asterisk marks a form that is not actually found in any document or living dialect but is reconstructed as having once existed in the prehistory of the language.) (see also Index: Lithuanian language)

The statable phonetic rules referred to earlier are not always obvious without careful observation. Note that the English dental consonants t, d, and th do not correspond in a straightforward manner to the Greek dental sounds t, d, and th; that is, English t does not occur where Greek t appears, nor English d where Greek has d. But the relationships between the sounds are not random either -- English t does not correspond to Greek t in one word, to d in a second, and to th in a third, according to no discernible pattern. Rather, where Greek has initial t, English has th, as in that and three; where Greek has d, English has t, as in tree, two, and ten; and where Greek has th, English has d, as in daughter. Note also that phonetic similarity as such is not needed to establish relationship. Thus, many of the Armenian words in Table 1 look quite different from the related words in other Indo-European languages, but here too regular rules of correspondence can be found; e.g., Greek initial p corresponds to Armenian h or zero (lack of a consonant) in the words meaning 'fire,' 'father,' 'foot,' and 'five.' (see also Index: English language)

Sanskrit studies and their impact.
The ancient Greeks and Romans readily perceived that their languages were related to each other, and, as other European languages became objects of scholarly attention in the late Middle Ages and the Renaissance, many of these were seen to be more similar to Latin and Greek than, for example, to Hebrew or Hungarian. But an accurate idea of the true bounds of the Indo-European family became possible only when, in the 16th century, Europeans began to learn Sanskrit. The massive similarities between Sanskrit and Latin and Greek were noted early, but the first person to make the correct inference and state it conspicuously was the British Orientalist and jurist Sir William Jones, who in 1786 said in his presidential address to the Bengal Asiatic Society that Sanskrit bore to both Greek and Latin

a stronger affinity, both in the roots of verbs, and in the forms of grammar, than could possibly have been produced by accident; so strong, indeed, that no philologer could examine them all three, without believing them to have sprung from some common source, which, perhaps, no longer exists. There is a similar reason, though not quite so forcible, for supposing that both the Gothick [i.e., Germanic] and the Celtick, though blended with a very different idiom, had the same origin with the Sanscrit; and the old Persian might be added to the same family . . . .

Nineteenth-century linguists firmly established the connections that Jones had elucidated and broadened the family to include Slavic, Baltic, and other language groups. In 1816 Franz Bopp, the German philologist, presented his Über das Conjugationssystem der Sanskritsprache in Vergleichung mit jenem der griechischen, lateinischen, persischen und germanischen Sprache ("On the System of Conjugation in Sanskrit, in Comparison with Those of Greek, Latin, Persian, and Germanic"), in which the relation of these five languages was demonstrated on the basis of a detailed comparison of verb morphology (structure). Two years later there appeared the Undersøgelse om det gamle Nordiske eller Islandske Sprogs Oprindelse (Investigation of the Origin of the Old Norse or Icelandic Language), by the Danish philologist Rasmus Rask, completed in 1814. This work demonstrated methodically the relation of Germanic to Latin, Greek, Slavic, and Baltic. (Rask included Celtic a few years later.) In 1822 the second edition of the first volume of Jacob Grimm's Deutsche Grammatik ("Germanic Grammar") was published; in this grammar were discussed the peculiar Indo-European vowel alternations called Ablaut by Grimm (e.g., English "sing, sang, sung"; or Greek peíth-o 'I persuade,' pé-poith-a 'I am persuaded,' é-pith-on 'I persuaded'). In addition, Grimm tried to find the principle behind the correspondences of Germanic stop and spirant consonants (the first made with complete stoppage of the breath, and the second made with constriction of the breath but not complete stoppage) to the consonants of other Indo-European languages. The sound changes implied by these correspondences have become known as Grimm's law. Examples of it include the stop consonant p in Latin pater corresponding to the spirant consonant f in father, and the correspondences between English and Greek t, d, and th discussed above. (see also Index: Grimm's law)

Bopp demonstrated in 1839 that the Celtic languages were Indo-European, as had been asserted by Jones. In 1850 the German philologist August Schleicher did the same for Albanian, and in 1877 another German philologist, Heinrich Hübschmann, showed that Armenian was an independent branch of Indo-European, rather than a member of the Iranian subbranch. Since then, the Indo-European family has been enlarged by the discovery of Tocharian and of Hittite and the other Anatolian languages, and by the recognition, with the aid of Hittite, that Lycian, known and partly deciphered already in the 19th century, belongs to the Anatolian branch of Indo-European.

The Indo-European character of Tocharian was announced by the German scholars Emil Sieg and Wilhelm Siegling in 1908. The Norwegian Assyriologist Jørgen Alexander Knudtzon recognized Hittite as Indo-European on the basis of two letters found in Egypt (translated in Die zwei Arzawa-briefe [1902; "The Two Arzawa Letters"]), but his views were not generally accepted until 1915, when Bedrich Hrozný; published the first report of his own decipherment of the much more copious material that had meanwhile been found in the ruins of the Hittite capital itself.

The first full comparative grammar of the major Indo-European languages was Bopp's Vergleichende Grammatik des Sanskrit, Zend, Griechischen, Lateinischen, Litthauischen, Altslawischen, Gotischen und Deutschen (1833-52; "Comparative Grammar of Sanskrit, Zend, Greek, Latin, Lithuanian, Old Slavic, Gothic, and German"). But this and August Schleicher's shorter Compendium der vergleichenden Grammatik der indogermanischen Sprachen (1861-62; "Compendium of the Comparative Grammar of the Indo-European Languages") were rendered obsolete by the major breakthrough of the 1870s, when scholars--prompted largely by the discoveries of a group of German scholars known as Neogrammarians--realized that sound correspondences are not merely rules of thumb that do not have to be strictly observed, but that apparent exceptions to sound laws can often be accounted for by stating them more accurately or by reconstructing additional different sounds in the parent language. The difference between Gothic d in fadar 'father' and þ in broþar 'brother,' for example, both corresponding to t in Sanskrit, Greek, and Latin, proved to be correlated with the original position of the accent, a discovery known as Verner's law (named for the Danish linguist Karl Verner). Thus, d appears when the preceding syllable was originally unaccented (fadar : Greek patér-, Sanskrit pitár- ), and þ occurs when the preceding syllable was originally accented (broþar : Greek phrater- 'member of a clan,' Sanskrit bhratar-). (see also Index: Verner's law)

The knowledge and opinions that had accumulated by the end of the 19th century are largely incorporated in the German linguist Karl Brugmann's Grundriss der vergleichenden Grammatik der indogermanischen Sprachen (2nd ed., 1897-1916; "Outline of Comparative Indo-European Grammar"), which remains the latest full-scale treatment of the family. (see also Index: "Outline of the Comparative Grammar of the Indo-European Languages," )

Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: THE PARENT LANGUAGE: PROTO-INDO-EUROPEAN

THE PARENT LANGUAGE: PROTO-INDO-EUROPEAN

By comparing the recorded Indo-European languages, especially the most ancient ones, much of the parent language from which they are descended can be reconstructed. This reconstructed parent language is sometimes called simply Indo-European, but in this article the term Proto-Indo-European is preferred.

Phonology.
Consonants.
Proto-Indo-European probably had 15 stop consonants. In the following grid these sounds are arranged according to the place in the mouth where the stoppage was made and the activity of the vocal cords during and immediately after the stoppage:

A "labial" sound is made with the lips, and a "dental" sound with the tip of the tongue against the back of the teeth. The "palatal" and "velar" sounds were probably made by contact between the back of the tongue and the soft palate--more toward the front of the mouth in the case of the palatals and more toward the back in the case of the velars (compare Arabic kalb 'dog' versus qalb 'heart'). The "labiovelar" sounds were made by contact between the back of the tongue and the soft palate with concomitant rounding of the lips. "Voiceless" designates sounds made without vibration of the vocal cords; "voiced" sounds are pronounced with vibration of the vocal cords. The exact pronunciation of the "voiced aspirates" is somewhat uncertain; they were probably similar to the sounds transcribed bh, dh, and gh in Hindi.

Correspondences pointing to the voiced labial stop b are rare, leading some scholars to deny that b existed at all in the parent language. A minority view holds that the traditionally reconstructed voiced stops were actually glottalized sounds produced with accompanying closure of the vocal cords. The status of the velar stops k, g, and gh has likewise been questioned. The earlier view that Proto-Indo-European had a series of voiceless aspirated stops ph, th, kh, kh, and kwh has largely been abandoned. (Aspirated consonants are sounds accompanied by a puff of breath.) There was one sibilant consonant, s, with a voiced alternant, z, that occurred automatically next to voiced stops. The existence of a second apical spirant, þ (presumed pronunciation like that of th in English thin), is extremely uncertain.

There is general agreement that Proto-Indo-European had one or more additional consonants, for which the label "laryngeal" is used. These consonants, however, have mostly disappeared or have become identical with other sounds in the recorded Indo-European languages, so that their former existence has had to be deduced mainly from their effects on neighbouring sounds. Hence, the laryngeal sounds were not suspected until 1878, and even then they were rejected by most scholars until after 1927, when the Polish linguist Jerzy Kurylowicz showed that Hittite often has h (perhaps a velar spirant like the ch in German ach) in places where a laryngeal had been posited on the evidence of the other Indo-European languages. There is still considerable disagreement about how many laryngeals there were, what they sounded like, what traces they left, and how best to symbolize them. Most scholars now believe there were three, which can be written H1, H2, and H3. Of these, H1 may have been h or a glottal stop; H2 was perhaps a pharyngeal spirant like Arabic h in hams 'five'; H3, whatever its other features, was probably voiced. The principal traces they left outside Anatolian are in the quality and length of neighbouring vowels, H2 changing a neighbouring e to a, and probably H3 changing it to o, while all laryngeals lengthened a preceding vowel in the same syllable. In Anatolian, H2 and H3 remained as h, at least in some positions. (see also Index: laryngeal consonant)

When laryngeals between consonants disappeared, a vowel sometimes remained, as in Greek stásis, Sanskrit sthitis, Old English stede 'a standing (place)' from Proto-Indo-European *stH2tis. Before the advent of the laryngeal theory, a separate Proto-Indo-European vowel {schwa}(called schwa indogermanicum) was reconstructed to account for these correspondences.

Finally, there were the nasal sounds n and m, the liquids l and r, and the semivowels y and w. When y and w occurred between consonants, they were replaced by the vowels i and u. The nasals and liquids functioning as nuclei of syllables in this position (like the final sounds of English bottom, button, bottle, butter) are traditionally written n, m, l, r. Some scholars dispense with these diacritical marks and with the distinction between syllabic i and u and nonsyllabic y and w, but this obscures certain distinctions, such as that between -wn- in *kwnsu 'among dogs,' Sanskrit shvasu, and -un- in *tund- 'shove,' Sanskrit tundate.

Vowels.
The vowel system of Proto-Indo-European consisted of the following sounds:

In forming front vowels, the highest point of the tongue is in the front of the mouth; for back vowels, that point is in the back. High vowels are those in which the tongue is highest--closest to the roof of the mouth; mid vowels are made with the tongue between the extremes of high and low.

The four mid vowels participated in a pattern of alternation called "ablaut." In the course of inflection and word formation roots and suffixes could appear in the "e-grade" (also called "normal grade"; compare Latin ped-is 'of a foot' [genitive singular]), "o-grade" (e.g., Greek pód-es 'feet'), "zero-grade" (e.g., Avestan fra-bd-a- 'forefoot,' with -bd- from *-pd-), "lengthened e-grade" (e.g., Latin pes 'foot' [nominative singular] from *ped-s), and/or "lengthened o-grade" (e.g., English foot, Old English fot).

There is some evidence for a similar pattern of alternation involving a, a, and zero. Most instances of apparent a and a, however, arose by "coloration" of e under the influence of a preceding or following H2 (e.g., Greek ag- 'lead' comes from *H2eg-, sta- 'stand' comes from *stH2-). Some cases of o, o, and e are likewise of laryngeal origin (e.g., Greek op- 'see' comes from *H3ekw-, do- 'give' comes from *deH3-, the- 'put' comes from *dheH1-). Among the high vowels, i and u did not participate in ablaut alternations but rather functioned primarily as the syllabic realizations of the consonants y and w, as in *leykw- 'leave,' zero-grade *likw-, parallel to *derk- 'see,' zero-grade *drk-. Long i and u in the recorded languages derive in large part from sequences of i or u plus laryngeal, as in Latin vivus 'alive' from *gwiH3wós.

The accent just before the breakup of the parent language was apparently mainly one of pitch rather than stress. Each full word had one accented syllable, presumably pronounced on a higher pitch than the others.

Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: THE PARENT LANGUAGE: PROTO-INDO-EUROPEAN: Morphology and syntax.

Morphology and syntax.
Verbal inflection.
The Proto-Indo-European verb had three aspects: imperfective, perfective, and stative. Aspect refers to the nature of an action as described by the speaker--e.g., an event occurring once, an event recurring repeatedly, a continuing process, or a state. The difference between English simple and "progressive" verb forms is largely one of aspect--e.g., "John wrote a letter yesterday" (implying that he finished it) versus "John was writing a letter yesterday" (describing an ongoing process, with no implication as to whether it was finished or not).

The imperfective aspect, traditionally called "present," was used for repeated actions and for ongoing processes or states--e.g., *stí-stH2-(e)- 'stand up more than once, be in the process of standing up,' *mn-yé- 'ponder, think,' *H1es- 'be.' The perfective aspect, traditionally called "aorist," expressed a single, completed occurrence of an action or process--e.g., *steH2- 'stand up, come to a stop,' *men- 'think of, bring to mind.' The stative aspect, traditionally called "perfect," described states of the subject--e.g., *ste-stóH2- 'be in a standing position,' *me-món- 'have in mind.'

Verb roots were by themselves either perfective (like *steH2- 'stand' and *men- 'think') or imperfective (like *H1es- 'be'). This basic aspect, however, could be reversed by morphological devices such as ablaut, suffixation, and reduplication. The stative aspect was normally marked by reduplication and the o-grade of the root in the indicative singular; it had personal endings that were partly distinct from those of the other two aspects.

From one aspect of a given verb the shape and even the existence of the other two aspects could not be predicted; for example, *H1es- 'be' had only the imperfective aspect. Ways of forming imperfectives were especially numerous and often involved, in addition to their imperfective aspectual meaning, some other notion, such as performing the action habitually or repeatedly (iterative), or causing someone else to perform it (causative). One root could thus have several imperfective stems; so to the root *H1er- 'move' there were at least a causative form, *H1r-new- 'set in motion,' and an iterative form, *H1r-ske- 'go repeatedly.'

The Proto-Indo-European verb was also inflected for mood, by which the speaker could indicate whether he was making statements or inquiries about matters of fact; making predictions, surmises, or wishes about the future or about unreal but imagined situations; or giving commands. Compare English "If John is home now (he is eating lunch)" with the verb is in the indicative mood, discussing a matter of fact, with "If John were home now (he would be eating lunch)" with the verb were in the subjunctive mood, describing an unreal situation. There were two Proto-Indo-European suffixes expressing mood: -e- alternating with -o- for the subjunctive, corresponding roughly in meaning to the English auxiliaries 'shall' and 'will,' and -yeH1- alternating with -iH1- for the optative, corresponding roughly to English 'should' and 'would.' Verbs without one of these two suffixes were marked for mood and tense by their personal endings alone.

These personal endings basically expressed the person and number of the verb's subject, as in Latin amo 'I love,' amas 'you (singular) love,' amat 'he or she loves,' amamus 'we love,' and so on. In the imperfective and perfective aspects there were two sets of endings, distinguishing two voices: active, in which typically the subject was not affected by the action, and mediopassive, in which typically the subject was affected, directly or indirectly. Thus Sanskrit active yájati and mediopassive yájate both mean 'he sacrifices,' but the former is said of a priest who performs a sacrifice for the benefit of another, while the latter is said of a layman who hires a priest to perform a sacrifice for him. In the stative aspect there was originally no distinction of voice. (see also Index: active voice)

To mark mood and tense, imperfective verbs that did not have a mood suffix distinguished three subtypes of active and mediopassive endings: imperative, primary, and secondary. Verbs with imperative endings belonged to the imperative mood (used for commands)--e.g., *H1s-dhí 'be (singular),' *H1és-tu 'let him be.' Verbs with primary endings were marked as non-past (present or future) in tense and indicative in mood--e.g., *H1és-ti 'he is.' (Indicative mood signifies objective statements and questions.) Verbs with secondary endings were unmarked for tense and mood but were normally used as past indicatives (e.g., *H1és-t 'he was,' *gwhén-t 'he slew') and to fill out gaps in the imperative paradigm (e.g., *H1és-te or *H1s-té 'you [plural] were,' but also 'be [plural]'; *gwhén-te or *gwhn-té 'you [plural] slew,' but also 'slay [plural]'). To mark such forms unambiguously as past indicatives, an augment, usually consisting of the vowel e, could be prefixed--e.g., *é-gwhen-t 'he slew,' *é-H1es-t 'he was.'

Verbs in the perfective aspect without a mood suffix did not occur with primary endings and thus lacked a true present tense. Verbs in the stative aspect substituted a distinctive set of endings for those of the primary set but apparently used the imperative and secondary endings in the usual way to form a stative imperative and a stative past indicative.

Nominal inflection.
The inflectional categories of the noun were case, number, and gender. Eight cases can be reconstructed: nominative, for the subject of a verb; accusative, for the direct object; genitive, for the relations expressed by English of; dative, corresponding to the English preposition to, as in "give a prize to the winner"; locative, corresponding to at, in; ablative, from; instrumental, with; and vocative, used for the person being addressed. For examples of some of these see Table 2. Besides singular and plural number, there was a dual number for referring to two items. Each noun belonged to one of three genders: masculine, to which belonged most nouns designating male creatures; feminine, to which belonged most names of female creatures; and neuter, to which belonged only a few words for individual adult living creatures. The gender of nouns not designating living creatures was only partly predictable from their meaning.

Adjectives were nounlike words that varied in gender according to the gender of another noun with which they were in agreement, or, if used by themselves, according to the sex of the entity to which they referred; thus, Latin bonus sermo 'good speech' (masculine), bona aetas 'good age' (feminine), bonum cor 'good heart' (neuter), or bonus 'a good man,' bona 'a good woman,' bonum 'a good thing.' The neuter of an adjective was often identical with the masculine except for having different endings in the nominative and accusative cases. Feminine gender was either completely identical with the masculine or derived from it by means of a suffix, the two commonest being *-eH2- and *-iH2- (*-yeH2-).

Demonstrative, interrogative, relative, and indefinite pronouns were inflected like adjectives, with some special endings. Personal pronouns were inflected very differently. They lacked the category of gender, and they marked number and case (in part) not by endings but by different stems, as is still seen in English singular nominative "I," but oblique "my," "me"; plural nominative "we," but plural oblique "our," "us." (The oblique is any case other than nominative or vocative.)

Syntax.
Some notable features of Proto-Indo-European syntax were the non-ergative case system, in which the subject of an intransitive verb received the same case marking as the subject (rather than the object) of a transitive verb; concord (agreement) in case, number, and gender between adjective and noun; and the use of singular verbs with neuter plural subjects, as in Greek pánta rhei 'all things flow,' with the same (singular) verb as ho pótamos rhei 'the river (masculine) flows,' contrasting with hoi pótamoi rhéousi 'the rivers flow' (indicating that neuter plurals were originally collectives and grammatically singular). Proto-Indo-European word order was flexible, but basic declarative sentences typically had the structure subject-object-verb (SOV).

Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: THE PARENT LANGUAGE: PROTO-INDO-EUROPEAN: Lexicon and culture.

Lexicon and culture.
Much less is known about the parent language's vocabulary than about its phonology and grammar. Sounds and grammatical categories do not easily disappear or undergo radical change in so many daughter languages that their former existence can no longer be detected. It is relatively easy, however, for an individual word to disappear or shift meaning in so many daughter languages that its existence or meaning in the parent language cannot be confidently inferred. Hence, from the linguistic evidence alone, scholars can never say that Proto-Indo-European lacked a word for any particular concept; they can only state the probability that certain items did exist and from these items make inferences about the culture and location in time and space of the speakers of Proto-Indo-European.

Thus is it supposed that the Proto-Indo-European community knew and talked about dogs (*kwón-), horses (*H1ékwo-), sheep (*H3éwi-), and almost certainly cows (*gwów-) and pigs (*súH-). Probably all these animals were domesticated. At least one cereal grain was known (*yéwo-), and at least one metal (*H2éyos). There were vehicles (*wógho-) with wheels (*kwékwlo-), pulled by teams joined by yokes (*yugó-). Honey was known, and it probably formed the basis of an alcoholic drink (*mélit- , *médhu) related to the English mead. Numerals up through 100 (*kmtóm) were in use. All this suggests a people with a well-developed Neolithic (characterized by simple agriculture and polished stone tools) or even Chalcolithic (copper- or bronze-using) technology.

Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: THE PARENT LANGUAGE: PROTO-INDO-EUROPEAN: The divergence of Indo-European languages.

The divergence of Indo-European languages.
Linguists have not found a reliable and precise way to determine from linguistic evidence alone the date at which any set of related languages must have begun diverging. The best that can be done is to estimate the degree of difference between the languages in question, taking into account all that is known about them, and then compare this estimate with the estimated degrees of difference within families of languages--such as the Romance family--whose actual time of divergence is approximately known. Using this sort of "dead reckoning," it can be said that the earliest attested Indo-European languages--Anatolian, Indo-Iranian, and Greek--are different enough that the parent language must have been split into several distinct languages before 3000 BC, but similar enough that the first split into separate languages is not likely to have been earlier than about 4500 BC.

For further progress the linguistic findings must be correlated with archaeological evidence. Linguistic, historical, and geographic considerations suggest that the speakers of Proto-Indo-European were a relatively small and homogeneous Eurasian population group that underwent significant expansion and fragmentation in the period around 4000 BC. Some scholars believe that the Indo-Europeans were the bearers of the Kurgan (Barrow) culture of the Black Sea and the Caucasus and west of the Urals. (see also Index: Kurgan culture)

The Kurgan culture, however, was only one of a number of related steppe cultures extending across the entire Black Sea-Caspian Sea region, an area that was transformed about 4000 BC by the advent of horse-drawn wheeled vehicles and related innovations. It is probably best, therefore, to follow J.T. Mallory (In Search of the Indo-Europeans [1989]) in locating the speakers of Proto-Indo-European among the populations of this region, but not to attempt a more precise identification until further evidence is available.

Remote relationship of Indo-European to the Uralic languages is not improbable. Geographically, the earliest reconstructible locations of the two families are contiguous; lexically, there are strong resemblances in a number of basic words or word parts, including personal, demonstrative, interrogative, and relative pronouns, personal endings of verbs, the accusative case ending -m, and such words as those for 'water' and 'name'; typologically, the families are fairly similar--e.g., both have many suffixes, but few or no prefixes or infixes (elements inserted within words). The resemblances, however, are too few to permit the reconstruction of a common "Indo-Uralic" parent language; the two families, if they are related at all, must have separated thousands of years before the breakup of Proto-Indo-European.

If Indo-European is related to other language families--e.g., to Afro-Asiatic (which includes the Semitic languages) or to Kartvelian (which includes Georgian)--it must have diverged from them much earlier than it diverged from Uralic, because the number of cogent resemblances is much smaller. There is no significant evidence at present for a "Nostratic" superfamily embracing these and other groups.


Languages of the World: INDO-EUROPEAN LANGUAGES: Overview of the language family: CHARACTERISTIC DEVELOPMENTS OF INDO-EUROPEAN LANGUAGES

CHARACTERISTIC DEVELOPMENTS OF INDO-EUROPEAN LANGUAGES

As Proto-Indo-European was splitting into the dialects that were to become the first generation of daughter languages, different innovations spread over different territories.

Changes in phonology.
Indo-Iranian, Balto-Slavic, Armenian, and Albanian agree in changing the palatal stops *k, *g, and *gh into spirants (s, sh, th, etc.) or affricates--e.g., Sanskrit ashri- 'sharp edge,' Old Church Slavonic ostru 'sharp,' Armenian aseln 'needle,' Albanian athëtë 'bitter' beside Greek ákros 'tip,' Latin acidus 'biting,' all from a basic element *H2ek- 'sharp, pointed.' (Spirants, also called fricatives, are sounds produced with audible friction as a result of the airstream passing through a narrow, but unstopped, passage in the mouth--e.g., English s, f, v. Affricates are sounds that begin as stops, with complete stoppage of the airstream, but are released as spirants, or fricatives--e.g., the ch in church, the j in jam.) The languages that change the palatal stops to spirants or affricates are known as "satem" languages, from the Avestan word sat{schwa} m 'hundred' (Proto-Indo-European *kmtóm), which illustrates the change. The languages that preserve the palatal stops as k-like sounds are known as "centum" languages, from centum (/kentum/), the corresponding word in Latin. The satem languages are not geographically separated from one another by any recorded languages that preserve the palatals as stops; it is therefore inferred that the change to affricates (whence later spirants) occurred just once and spread over a cohesive dialect area of Proto-Indo-European.

Of the languages that share this change, however, Balto-Slavic shares with Germanic (including English) an m in certain case endings where other Indo-European languages, including Indo-Iranian, Armenian, and Albanian, have bh or a sound regularly developed from bh. Examples of the m ending include English the-m and Old Church Slavonic te-mu 'to those ones'; the bh and related sounds (ph, v, b) are illustrated in the following: Sanskrit té-bhyas 'to those ones,' Armenian noro-vk' 'with new ones,' Albanian male-ve 'to mountains,' Greek ókhes-phin 'with chariots,' Latin omni-bus 'for all.' Because Balto-Slavic and Germanic are neighbours, it is inferred that m replaced bh in these case endings just once in the parent language and that the area over which this innovation spread only partly overlapped the area that adopted affricated pronunciation of the palatals.

This pattern is general for changes dating from the time the parent language was breaking up into distinct languages. Each of the resulting languages shares some innovations with some of its neighbours, but only rarely do different innovations shared by two or more branches of Indo-European cover exactly the same territory.

Once the dialects had become differentiated enough to be distinct languages--certainly by 2500 BC in most cases--each largely went its own way, and agreements in developments since then are due either to borrowing across language boundaries (as in the notable convergences between Modern Greek, Albanian, Romanian, and the southernmost Slavic languages) or to parallel but independent workings out of the same base material.

In phonology, the most striking changes have been loss or reduction in many languages of final or unaccented syllables, and loss in several languages of certain consonants between vowels, often followed by contraction of the resulting vowel sequence. Thus words in modern Indo-European languages are often much shorter than their Proto-Indo-European ancestors--e.g., English 'four,' Armenian c'ork', colloquial Persian car 'four' from *kwetwóres; French vit (pronounced vi) 'lives' from *gw íH3weti; Russian dvestí 'two hundred' from *duwóy H1 kmtóyH1.

Changes in morphology.
As a result of the fact that much of the marking of Proto-Indo-European inflectional categories was done in final syllables, loss and reduction of these syllables have often had serious grammatical consequences. In the noun, loss of endings has generally led to loss or great reduction of the case and gender systems, while ways have generally been found to salvage the distinction between singular and plural. In Modern Persian, for example, where all final syllables have been lost, the old case and gender distinctions have disappeared also, but plural number is still regularly marked, either with -an (originally the genitive plural ending of some nouns) or with -ha (of obscure origin). (see also Index: Persian language)

In the verb, where more endings originally had two syllables, loss of final syllables has had less serious consequences for morphology. Even here, however, some languages, including English, have totally or almost totally given up the marking of subject by personal endings. Compare English "I, we, you, they love" and "he, she loves" with the Spanish conjugation for 'love'--amo, amas, ama, amamos, amáis, aman--or the Russian version--ljubljú, ljúbish, ljúbit, ljúbim, ljúbite, ljúbjat.

Changes in noun inflection have generally involved simplification. Almost everywhere the dual number has been lost; in many languages the noun genders have been reduced from three to two (as in French, Swedish, Lithuanian, and Hindi) or lost entirely (as in English, Armenian, and Bengali). Only Slavic has complicated the gender system by imposing on the inherited distinctions contrasts of animate versus inanimate or of personal versus nonpersonal. (see also Index: Slavic languages)

Everywhere except in the oldest Indo-Iranian languages the original eight Indo-European cases have suffered reduction. Proto-Germanic had only six cases, the functions of ablative (place from which) and locative (place in which) being taken over by constructions of preposition plus the dative case. In Modern English these are reduced to two cases in nouns, a general case that does duty for the vocative, nominative, dative, and accusative ("Henry, did Bill give John the letter?") and a possessive case continuing the old genitive ("Bill's letter"). In languages such as French and Welsh, nouns are no longer inflected for case at all. In some languages, to be sure, nouns have begun fusing with words placed directly after the nouns to create new case systems, coexisting with relics of the old. Thus, Old Lithuanian had in addition to seven inherited cases an illative (place into), made by adding -n(a) to the accusative (peklosna 'into hell'), an allative (place to, toward), made by adding -p(i) to the genitive (Jesausp 'to Jesus'), and an adessive (place at which), made by adding -p(i) to the locative (Joniep 'in John'). (see also Index: English language)

Changes in the verb have been more complex. Besides loss or merger of old categories, many new forms have been created and many old forms have acquired new values. In Ancient Greek the focus of the stative aspect (perfect) has largely shifted from the present state ("he is dead") to the previous event that led to this state ("he has died"). As a result, the perfect came to mean the same as the perfective past (aorist), and it has therefore disappeared from Modern Greek. New forms created in Ancient Greek include future and future perfect tenses, based on the desiderative present forms (such as "he wants to walk") of the parent language.

In Germanic the principal new creation was the weak past tense (ending in a t or d), such as English loved, thought, German liebte, dachte, made by combining the verb stem with a past tense of the Germanic verb for 'do.' (The strong past tense formed by vowel alternations, like "sing, sang," "run, ran" comes from the Proto-Indo-European stative aspect.) (see also Index: Germanic languages)

In some languages participles have come to function as finite verbs. Thus in Hindi admi larki-ko dekhta 'the man sees the girl,' dekhta 'sees' is etymologically a participle 'seeing,' agreeing in number and gender with the subject admi 'man.' In the past tense, admi-ne larki dekhi 'the man saw the girl,' the verb dekhi is etymologically a past passive participle 'seen,' agreeing in gender and number with the object larki 'girl,' and the subject is marked with an instrumental ending. (see also Index: Hindi language)

Vocabulary changes.
Changes in vocabulary have been even greater than those in sounds and grammar. Words in modern Indo-European languages have several sources. They may be recognizable loanwords, such as English skunk, chain, and inch (from Algonquian, French, and Latin, respectively); they may have been formed within the history or prehistory of the language itself, such as English radar and rightness; they may be of obscure origin, such as English drink, which is common Germanic but has no cognates outside Germanic, or boy, which is peculiar to English and Frisian; or they may be inherited words that have changed meaning, such as English merry from Proto-Indo-European *mrghú- 'short.' Only a small fraction of the vocabulary can be traced back to words that can confidently be asserted to have existed in the parent language with approximately their present meaning. The same is true, albeit in a lesser degree, even for the oldest recorded Indo-European languages. None has more than a few hundred words and roots that are clearly inherited from the parent language without essential change of meaning. Table 1 gives examples of words that have been widely retained with little change. Typically they include pronouns; nouns, verbs, and adjectives of relatively simple and ubiquitous meaning; numerals; and simple adverbs and prepositions.

Non-Indo-European influence on the family.
Indo-European languages, like all languages, have always been subject to influence from neighbouring languages, both related and unrelated.

The influence of non-Indo-European languages on the sounds and grammar of Proto-Indo-European is not demonstrable, partly because there is no direct evidence about the languages that were in contact with Indo-European before roughly 3000 BC. It can be surmised, however, that some words are loans--e.g., *péleku- 'ax,' a word for an object likely to be imported or learned of from neighbours with superior technology and which is not analyzable into a known Indo-European root plus a known Indo-European suffix.

When Indo-European languages have been carried within historic times into areas occupied by speakers of other languages, they have generally taken over a number of loanwords, as with English and Spanish in the Americas or Dutch in South Africa. Aside from the special case of pidgin and creole languages, however, there has been comparatively little effect on sounds and grammar. These have been significantly affected within historic times only when an Indo-European language has been spoken in prolonged close contact with non-Indo-European speakers, as with Ossetic (an Iranian language) in the Caucasus, or when its speakers have been very strongly influenced culturally by speakers of a non-Indo-European language, as with Persian, in which Arabic plays much the same role as Latin does in English.

In prehistoric times most branches of Indo-European were carried into territories presumably or certainly occupied by speakers of non-Indo-European languages, and it is reasonable to suppose that these languages had some effect on the speech of the newcomers. For the lexicon, this is indeed demonstrable in Hittite and Greek, at least. It is much less clear, however, that these non-Indo-European languages affected significantly the sounds and grammar of the Indo-European languages that replaced them. Perhaps the best case is India, where certain grammatical features shared by Indo-European and Dravidian languages appear to have spread from Dravidian to Indo-European rather than vice versa. For most other branches of Indo-European languages any attempt to claim prehistoric influence of non-Indo-European languages on sounds and grammar is rendered almost impossible because of ignorance of the non-Indo-European languages with which they might have been in contact. (W.C. /J.H.Ja.)

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages

Anatolian languages

The term Anatolian languages in its most comprehensive use includes both the Indo-European and non-Indo-European languages spoken in Anatolia (Asia Minor) before the Greco-Roman period. The Anatolian languages are known only from texts of the 2nd and 1st millennia BC; the earliest evidence is that of the so-called Cappadocian tablets (19th-18th centuries BC). The term Asianic is sometimes used as an alternative designation for the Anatolian languages, but, since the discovery in 1915 that Hittite, the main Anatolian language, is an Indo-European language, there has been a tendency to use Asianic in a more restricted sense for the non-Indo-European languages that existed in Anatolia before the entry of the Indo-Europeans. These are called substratum languages. (see also Index: Hittite language)

Hattic (or Hattian), also misleadingly called Proto-Hittite, is the best-known substratum language. It is completely unrelated to Hittite and its sister languages as well as to Hurrian, a language also spoken in Anatolia.

The Anatolian group of Indo-European languages consists of Hittite, Palaic, Luwian, Hieroglyphic Luwian, Lydian, and Lycian. Hittite, Palaic, and Luwian are known from 2nd-millennium cuneiform texts found in the excavations in Bogazköy-Hattusa since 1905; Hieroglyphic Luwian is found on scattered inscriptions and seals from Anatolia (mainly the southern area) and northern Syria dating mainly from later times (i.e., between c. 1200 and 700 BC, although there are earlier examples from the empire period, c. 1400-c. 1190 BC). Lydian and Lycian are known from texts in alphabetic script from c. 600 to 200 BC. It seems fairly reasonable to add the Carian language of southwest Anatolia to this list as well as other less well documented languages like Sidetic. More to the east, in the Caucasus region centring around Lake Van, Hurrian of the 3rd and 2nd millennia BC was replaced in the 1st millennium BC by the related Urartian language. Both of these languages are definitely non-Indo-European. (see also Index: Bogazköy)

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: Historical background of ancient Anatolia.

Historical background of ancient Anatolia.

It is customarily assumed that the Indo-Europeans entered Anatolia around or shortly after 2000 BC, although there are no specific archaeological data that might enable scholars to specify the period of entry or the route the invaders followed. On the basis of the agricultural terminology used in Hittite, it has been suggested that the entry into Anatolia was not a warlike invasion of predominantly male groups. If such had been the case, the influence of substratum languages would have been likely, but, on the contrary, the word stems used are definitely Indo-European. The differences in the terminology used in other Indo-European subgroups indicate that the "Anatolians" seceded from the parent group at an early date, before the common agricultural nomenclature came into being. On the other hand, Hittite shares the Indo-European notion of the hereafter, pictured as a pastureland with grazing cattle "for which the dead king sets out." (see also Index: afterlife)

There is a tendency among linguists to postulate an eastern route of entry into Anatolia by way of the Caucasus, because certain grammatical features--e.g., the loss of the feminine gender--might be explained as having been caused by prolonged contacts with Caucasian languages. It is likely that the Indo-European forebears of the later speakers of Hittite, Palaic, Luwian, and Lydian entered Anatolia together, following a common route, because the Anatolian languages share a considerable number of losses as well as innovations that presuppose a long common past.

In the central parts of Anatolia, within the bend of the Halys River (modern Turkish, Kizil Irmak), and in the northern regions, Hittite and Palaic were profoundly influenced by Hattic as a substratum language. The Hattian culture also changed the political and religious concepts of the newcomers, and a clear cultural dependency of the Indo-Europeans on the older Hattian population is evident. Some scholars have stressed the likelihood that farther to the south the Luwians might have been conversant with a different substratum. In view of the absence of textual evidence, and because knowledge of the Luwian vocabulary is rather restricted, it is perhaps not surprising that this possible substratum element escapes definition. (For the history of Anatolia in the 2nd and 1st millennia BC, see TURKEY AND ANCIENT ANATOLIA: Ancient Anatolia.) (see also Index: Luwian language)

The most important invaders of Anatolia in the "Dark Age" (after 1190 BC) were the Phrygians. Their language is definitely Indo-European, but it bears no relationship to the Anatolian subgroup. Rather, it seems akin to Thracian, Illyrian, or possibly Greek. Greek, in the second half of the 1st millennium BC, and, later, Latin, from the 2nd century onward, entered central Anatolia as languages of a ruling caste. Much earlier--beginning in Mycenaean times--the west coast had attracted Greek settlers. In the first half of the 1st millennium, the southern and northern shores also attracted Greek-speaking peoples. To the east in the Caucasus region, other Indo-Europeans, the Armenian-speaking invaders, penetrated into the former Urartian territory well before the beginning of the Persian period, probably in the 7th and 6th centuries BC. During Persian times, a Persian ruling caste entered eastern and also northeastern Anatolia and was still clearly recognizable in the Hellenistic and Roman periods (e.g., in Bithynia, Pontus, Cappadocia, and Commagene). Late data on names and scattered remarks made by Fathers of the Church indicate that until late Roman and perhaps even Byzantine times, some Anatolian dialects remained in use in certain isolated parts of the interior. (see also Index: Iranian languages)

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: Classification of the languages.

Classification of the languages.



Figure 2: Relationship between members of the Anatolian subgroup.
Research on the Anatolian languages began in 1821 with the Lycian language and passed an initially fruitful phase in the 1880s with work on Hieroglyphic Hittite (nowadays referred to as Hieroglyphic Luwian). In 1902 the Norwegian Assyriologist Jørgen Alexander Knudtzon's study on the Arzawa letters was published; these were two letters exchanged between a king of Arzawa and Pharaoh Amenhotep III that had been found in the Amarna archive. They were written in the Hittite language in cuneiform writing. In 1915 research reached a climax with the interpretation of Cuneiform Hittite by the Czech Orientalist Bedrich Hrozný. In all four of these highlights, the discovery that the texts in question were Indo-European was either clearly expressed or more discreetly implied. This conclusion was based on both the nominal (noun) declension and the verbal conjugation: the languages had a nominative ending in -s, the accusative in -n, verbal endings like -ti and -nti for the 3rd person singular and plural of the present tense, and an imperative form like estu "let it be." These features were deemed to be sufficient proof of their Indo-European origin. Study of the Anatolian subgroup of Indo-European thus began with Lycian, the last Anatolian offshoot in the temporal sequence, then passed the intermediary stage of Hieroglyphic Luwian, and reached the 2nd-millennium Hittite language in 20th-century research. For the relationship between members of the Anatolian subgroup, see Figure 2. (see also Index: Arzawa letters)

The non-Indo-European Hurrian and Urartian languages are related to one another, but modern research indicates that Urartian should not be considered as a direct continuation of Hurrian.

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: HISTORY AND DEVELOPMENT

HISTORY AND DEVELOPMENT

Languages using cuneiform writing and Anatolian hieroglyphs.
Hattic.
The Hattic language appears as hattili in Hittite cuneiform texts. Called Proto-Hittite by some, it was the language of the linguistic substratum inside the Halys River bend and in more northerly regions. Apparently the Indo-European newcomers of Hittite stock were named with the same designation as their predecessors. All the Hattic material preserved by Hittite scribes belongs to the religious sphere of life: rituals (e.g., connected with the erection of a new building), incantations, antiphons, litanies, and myths. Among the Hattic interpolations in Hittite texts, there are some to which a Hittite translation has been added. It is impossible to ascertain the length of time that the Hattians had been present in Anatolia before the Indo-Europeans entered the country, but it seems certain that during the Hittite New Empire (c. 1400-c. 1190 BC) Hattic was a dead language.

Hattic studies began in 1922 with the work of the German Assyriologist Emil Forrer. In 1935, Hans G. Güterbock, a German-born Orientalist, published a large group of texts containing Hattic material and in so doing completed the publication of the Hattic texts stemming from the Winckler excavations (1905-12). Important studies on the subject have continued to appear since then.

Hittite.
The Hittite language is known from the approximately 25,000 tablets or fragments of tablets preserved in the archives of Bogazköy-Hattusa, excavated by German archaeologists beginning in 1905. In Hittite cuneiform texts, the language is referred to as nesili (nasili) "language of Nesa," or nesumnili "language of the Neshite." Earlier Hittite linguistic material may be found in the indigenous proper names and a few loanwords from the local dialect that are recorded in the Cappadocian tablets (the commercial correspondence in Assyrian of Assyrian colonists living in Anatolia, especially in the emporium at Kültepe, near modern Kayseri, between c. 1900 and 1720 BC). The data from Kültepe are sometimes referred to as "Kaneshite" (from Kanesh, the old name of Kültepe); this is obviously the modern equivalent of the word kanisumnili "language of the Kaneshite" found in a Hittite text. It is possible, or even likely, that Kanesh and Nesa do, in fact, refer to the same entity.

Hittite tablets from places outside of the Hittite capital are rare; only stray examples have been found--e.g., in Tarsus, Alalakh, Ugarit, and Amarna. These findings attest to the growth of a great Hittite empire, especially between c. 1400 and c. 1190 BC. Old Hittite, the written embodiment of the earliest Indo-European language that has been discovered so far, is known from some tablets preserved in an "old ductus" type of handwriting that was typical of copies from the Old Kingdom period (c. 1700-1500 BC). The intermediary "Dark Age" between c. 1500 and c. 1400 BC is sometimes referred to as the period of the so-called Middle Hittite language. Most of the Old and Middle Hittite texts, however, are preserved in copies from the later empire period.

The archives of Bogazköy-Hattusa have been found in various places in the citadel, in the Great Temple complex, and in the "House on the Slope." Although the majority of the texts are concerned with religious subjects (oracle texts, hymns, prayers, myths, rituals, and festival texts), these archives also contain material of historical, political, administrative, literary, and legal character. The cuneiform adopted by the Hittite scribes is a variant of a writing system of Mesopotamian origin that closely resembles the ductus and shapes prevalent in tablets of the 17th century BC (layer VII) from Alalakh (modern Atsana in southeastern Turkey). It is possible that the cuneiform script might have been introduced as a result of the Hittites inducing Syrian scribes to transfer their activities to the Hittite capital during the early part of the Old Kingdom, shortly after 1650 BC. It has also been posited, with good reason, that the newly acquired script was first used to write Akkadian and was only later employed for Hittite as well. In addition to the genres enumerated above, the "scholarly literature" deserves to be mentioned. This consists of the material considered by the scribes to be essential for their training; it includes word lists, omens, and ritual prescriptions, all reflecting an encyclopaedic approach aimed at complete coverage of the subjects concerned. The Sumerian texts found in these archives belong to this class of literature. For treaties and correspondence with foreign powers, Akkadian was used as the diplomatic language of that period. Therefore, both Sumerian and Akkadian formed part of the curriculum of the qualified scribes, these languages belonging to the "eight languages" found in the Hittite archives. (see also Index: Sumerian language)

In actual fact, the first decipherer of Hittite was the Norwegian scholar J.A. Knudtzon, who pointed out in 1902 that the language of the so-called Arzawa letters (i.e., Hittite)--found in the Amarna archive--had an apparent affinity with Indo-European. Because the cuneiform script had already been deciphered, Knudtzon, and Bedrich Hrozný after him, were able to "read" their texts. Thus their discovery consisted more in the interpretation than in the actual decipherment of the written material. The first series of German excavations, lasting from 1905 to 1912, produced about 10,000 tablets. It was work on this corpus that familiarized Hrozný with the contents of these tablets and led him to his epoch-making discovery that Hittite was indeed Indo-European (1915).(See also WRITING: Cuneiform.)

Palaic.
Palaic, which appears as Palaumnili "language of the Palaite" in Hittite cuneiform texts, was the language of the region of Pala (probably Blaëne in the Greek period), in northwest Anatolia. During the Old Hittite kingdom, Pala, Luwiya, and Hattusa formed the three major provinces of the Anatolian part of the Hittite territory. From the intermediary "Dark Age" onward, Kaska nomads made their influence felt in northern Anatolia, and this resulted in a decline of importance for this region. (see also Index: Palaic language)

The Indo-European character of Palaic was first advocated by Emil Forrer (1922). Part of the text material is preserved on tablets in "old ductus." The knowledge of the limited vocabulary leaves much to be desired, but parallels--especially in the inflection of the noun, the forms of the demonstrative, relative, and enclitic pronouns, and the verbal endings--vouch for a close relationship to Hittite and Luwian.

Luwian.
Luwian (or Luvian), the language of Anatolia's southern coast, is known from texts stemming from three major periods: (1) the Hittite New Empire (c. 1400-c. 1190 BC); (2) the period of the Neo-Hittite states (c. 1190-c. 700 BC); (3) the period of the Lycian monumental inscriptions (c. 400-200 BC). In addition to the various time periods, there is also a variation in writing system--Mesopotamian cuneiform, Anatolian hieroglyphs, and an alphabet derived from a Greek source--and dialectal differentiation. There are indications that as early as the 15th and 14th centuries BC, there was a West Luwian dialect (the precursor of alphabetic Lycian) and an East Luwian dialect (the forerunner of the later Hieroglyphic Luwian of the Neo-Hittite states). Both of these differed from the Luwian found in the archives of Bogazköy-Hattusa, which was possibly a central dialect.

As in the case of Palaic, the pioneering work on Luwian written in cuneiform was done by Emil Forrer (1922). Following this work, new text materials were published in 1953, closely followed by both grammatical and vocabulary studies as well as a standard dictionary of Cuneiform Luwian (1959).

The Anatolian hieroglyphic system has a long history, with its logographic beginnings dating back to early Hittite stamp seals of the 18th and 17th centuries BC; the youngest texts seem to date from the last quarter of the 8th century BC. The geographical range of the inscriptions is great, stretching from Sipylus and Karabel in the extreme west to Alaca Hüyük and Bogazköy-Hattusa in the north, Malatya, Samsat, and Tell Ahmar (Til Barsib) in the east, and Hama and ar-Rastan in the south. During the "Dark Age" of the 16th and 15th centuries BC, the early writing grew into a fully developed writing system with logograms (word-signs), syllabic values, and auxiliary signs. During the New Empire, the script was already in use for a multitude of purposes (rock inscriptions, seals, and wooden tablets for everyday use in the temple and the army). Whether an example of the empire period such as the Aleppo inscription already reflects the Luwian language is a moot question but seems likely. It is certain that the later inscriptions of the Neo-Hittite states were in Luwian.

The first attempts to decipher Hieroglyphic Luwian, made by the British archaeologist Archibald H. Sayce, were fortunate in some fundamental details, but it was not until the 1930s that systematic and mutually stimulating research by scholars of several countries led to the establishment of a number of syllabic values for the characters as well as to a correct analysis of the sentence structure of the inscriptions. In his publication of the (bilingual) Hittite royal seals (in 1940, 1942), Hans G. Güterbock bridged the gap between the inscriptions of the empire period and the late Neo-Hittite states; the seals found in the French excavations at Ugarit (in northern Syria) served a similar purpose. The most important recent finding was the discovery in 1947 by Helmuth T. Bossert, a German archaeologist, of the Karatepe bilingual inscriptions, written in Phoenician and Hieroglyphic Luwian.

On many points the Luwian vocabulary is still an enigma. The unity between the various Luwian dialects and the close relationship of Luwian to the other members of the Anatolian subgroup, however, is secured by several linguistic parallels, especially in the singular inflection of the noun, the forms of certain pronouns, the verbal endings, and a number of lexical (vocabulary) correspondences.

Hurrian.
In earlier stages of research, the terms Mitanni language and Subarian were used as designations for Hurrian. In Hittite cuneiform texts, hurlili "language of the Hurrian" is used. In the last centuries of the 3rd millennium BC, Hurrians were already present in the Mardin region, which, from a geographical point of view, belongs to the North Mesopotamian plain. In Mesopotamian texts (from the time of the Akkad dynasty) some Hurrian personal names and glosses have been found. The customary assumption is that this non-Semitic and also non-Indo-European ethnic group had come from the Armenian mountains. During the beginning of the 2nd millennium BC, the Hurrians apparently spread over larger parts of southeast Anatolia and northern Mesopotamia. Still later, during the intermediary "Dark Age," they are supposed to have infiltrated into Cilicia and the adjacent Taurus and Antitaurus regions (Kizzuwatna in 2nd millennium texts). Before the middle of the 2nd millennium BC, an Indo-Aryan ruling caste wielded some type of authority over parts of Hurrian territory. Some names and words in ancient Near Eastern texts bear witness to their presence. Among these words are a group of technical terms related to the training of horses that found its way into Hittite treatises on that subject; they are most important from a historical point of view. After Sumerian, Akkadian, Hattic, Palaic, and Luwian, Hurrian and these Indo-Aryan glosses constitute the sixth and seventh additional languages of the Hittite archives. (see also Index: Hurrian language )

Hurrian texts have been found in Urkish (Mardin region, c. 2300 BC), Mari (on the middle Euphrates, 18th century BC), Amarna (Egypt, c. 1400 BC), Bogazköy-Hattusa (Empire period), and Ugarit (on the coastline of northern Syria, 14th century). Amarna yielded the most important Hurrian document, a political letter sent to Pharaoh Amenhotep III. From Mari came a small number of religious texts; from Bogazköy-Hattusa, literary and religious texts; and from Ugarit, vocabularies belonging to the more "scholarly literature" described above and Hurrian religious texts in Ugaritic alphabetic script. Hurrian personal names, found in texts from many sites (Bogazköy-Hattusa, Alalakh, Ugarit, and especially Nuzu), constitute a second linguistic source of major importance.

The research on Hurrian started in the 1890s with simultaneous contributions by several scholars. Subsequently, Bedrich Hrozný (1920) and Emil Forrer (1919, 1922) discovered the presence of Hurrian material in the Bogazköy-Hattusa archives.

Urartian.
The terms Chaldean and Vannic have also been used as designations for Urartian during earlier stages of research. Urartian is not a late dialect of Hurrian but a separate language, although both stem from a common parent. During the 9th through 6th centuries BC, Urartian was used in northeastern Anatolia as the official language of the state of Urartu, which centred around the district of Lake Van but also extended over the Transcaucasian regions of modern Russia and into northwestern Iran and at times even into parts of North Syria. The Urartian texts are written in a variant of the Neo-Assyrian script and consist mostly of monumental inscriptions (annals, votive inscriptions related to building and irrigation activities), some small inscriptions on helmets and shields dedicated in the temple, and a few economic cuneiform tablets. Two bilingual inscriptions in Urartian and Assyrian that apparently correspond very closely provided the key to the understanding of the language; the stylistic resemblances to Assyrian texts of the same period guided the further interpretation.

Archibald H. Sayce was the first scholar to devote his attention to Urartian in the 1880s and 1890s and continued his activities until 1932. More important were the philological contributions of the German historian Carl F. Lehmann-Haupt between 1892 and 1935. The first reliable description of Urartian grammar was published by the German Orientalist Johannes Friedrich (1933).

Next to the Urartian texts in cuneiform writing, there also existed an indigenous hieroglyphic script that is still undeciphered and is too meagrely represented to warrant a serious attempt.

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: HISTORY AND DEVELOPMENT: Languages using a derivative of the Phoenician or Greek alphabet.

Languages using a derivative of the Phoenician or Greek alphabet.
Phrygian.
The Phrygian inscriptions and graffiti may be separated into two groups, the Old Phrygian texts in a typical Phrygian alphabet dating from c. 730-450 BC, and the New Phrygian inscriptions (sepulchral texts in the Greek alphabet) stemming from the 1st and 2nd centuries AD. The Old Phrygian texts may be divided into a central group (Midas City and the central area) and an eastern group (found in Gordium), with offshoots in a still more eastern direction marking the utmost Phrygian expansion (inscriptions in or around Hüyük near Alaca, in Bogazköy-Hattusas, and in Tyana). An important recent finding--and the longest Old Phrygian text to date--is the rock inscription near the village of Germanos (modern Soguk Çam) in Bithynia (found in 1966). The total number of Old Phrygian texts now stands at about 80; more than 50 of these are from Gordium and represent about one-quarter of the available material. There is a consensus of opinion on the Indo-European character of the Phrygian language; most scholars think that Phrygian is somehow connected to the Greek branch of Indo-European languages, although, at an earlier stage, some scholars considered the possibility of a connection with the Anatolian branch of Indo-European, and others proposed a relationship with Thracian and Illyrian. (see also Index: Bogazköy)

In a publication of new material from Gordium, the U.S. archaeologist Rodney S. Young cautiously suggested that the Old Phrygian alphabet may be dependent on a prototype in use on the North Syrian or Cilician coasts. The old idea that the Phrygian alphabet was dependent on a Greek one (and not vice versa) need not be abandoned in that case. Historically, such a derivation would present no problems, because the presence of Greek settlements in these areas during the second half of the 8th century BC is amply attested to by both Assyrian annals and late Greek historical sources as well as by archaeological findings. Internal evidence from the Phrygian alphabet, presented by the French linguist Michel Lejeune, serves as proof for some researchers that that alphabet derived from the Greek one.

Lydian.
Of the more than 70 Lydian texts (e.g., sepulchral inscriptions, votive texts, many graffiti), more than half have been found by United States excavators at the Lydian capital, Sardis. Two small Greek-Lydian bilingual texts were far less helpful than the famous Aramaic-Lydian text. A few texts (about ten) may go back to the 6th or 5th centuries BC, but many more stem from the 4th century. The Lydian alphabet was derived from an East Greek prototype; the superfluous signs in the Greek alphabet were used for specific Lydian sounds, and additional signs were either borrowed from other "Anatolian alphabets" or freely created. (see also Index: Lydian language)

Important results concerning Lydian were reached using a strictly combinatory method; i.e., passages were compared that expressed similar contents in a slightly different manner in order to obtain a better understanding of the language's structure. This stage of the research culminated in a conclusive article by the Italian Piero Meriggi on the Indo-European character of Lydian (1936). Subsequently, other scholars published evaluations of the Lydian data, a dictionary, and a grammar book. The study of Lydian is hampered by many lexicological uncertainties, but there is at least a growing consensus on matters of grammar leading to the common notion that Lydian belongs to the Anatolian subgroup of Indo-European. The final obstacle to this classification as Anatolian was removed in 1959 by the Italian Onofrio Carruba, who proved that Lydian, like the other members of the Anatolian branch, does not possess a separate feminine gender. Lydian shares common features with Hittite, Palaic, and Luwian and should therefore be acknowledged, it seems, as a fourth independent member of the Anatolian subgroup.

Carian.
A great number of the more than 100 Carian inscriptions are graffiti found in Egypt that were left behind by Carian mercenaries in the services of Egyptian pharaohs of the Saitic period (664-525 BC). In recent years, more monumental inscriptions have been found in Caria itself, and Carian clay tablets have also been discovered. In the mid-20th century, several scholars concluded that Carian writing consists of a purely alphabetic script and is not a mixed system of both single letters and syllabic signs as was formerly thought. It is a likely but still unproven assumption that Carian may also be classified in the Anatolian subgroup of Indo-European. (see also Index: Carian language)

Lycian.
More than 150 Lycian monumental inscriptions have been found so far, which, with very few exceptions, are sepulchral in character. They are written in an indigenous Lycian alphabet that is based not on an East Greek prototype (as its Lydian replica) but on a West Greek one. Although the Lycian coin legends are still usually dated from the period between 500 and about 360 BC, the tradition of the Lycian monumental inscriptions is now thought to have continued for a longer period, into the 3rd century BC. During the beginning of the research in the first half of the 19th century, extensive use was made of a good bilingual text that offers a faithful Greek translation. In the first phase of research, which ended about 1880, Lycian was investigated by an etymological method by which it was linked up either with Greek or the Iranian languages. A more reliable combinatory method was later introduced, but the most fecund phase in the study of Lycian occurred at the end of the 19th century, when the Scandinavian school of scholars cooperated closely in the publication of several important studies. In 1945, Holger Pedersen published a synthesis of all data that seemed to indicate a relationship of Lycian with Hittite; thus Pedersen proved conclusively that Lycian belongs to the Anatolian branch of Indo-European languages. This conclusion was slightly modified when the British scholar Franz J. Tritsch (in 1950), and, later, the French scholar Emmanuel Laroche showed that Lycian should be more specifically compared to Luwian. (see also Index: Lycian language)

Sidetic.
The historical detail preserved by the Greek historian Arrian that the city of Side on the Pamphylian coast possessed a particular, indigenous language has been strikingly confirmed by legends on Sidetan coins of the 5th (?) through the 3rd (?) centuries and by five inscriptions from the 3rd and 2nd centuries BC (two of which are bilingual). There is a curious likelihood that this alphabet was directly derived from a Semitic writing system rather than from a Greek prototype, but Greek influence was not absent, as is clearly evidenced in the Greek bilingual texts and by a loanword from Greek. The first reliable study of Sidetic was made by Helmuth T. Bossert in 1950. In the case of Sidetic, even the value of a group of signs is still undecided, and research has not yet reached a stage in which a fruitful analysis of the texts and a classification of the language are within sight. (see also Index: Sidetic language)

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: LINGUISTIC CHARACTERISTICS

LINGUISTIC CHARACTERISTICS

Non-Indo-European languages.
The non-Indo-European Hattic is an agglutinative language; that is, it combines several elements of meaning into a single word. In the conjugation of verbs, it uses prefixes that are attached to the word stems, which are mostly monosyllabic or bisyllabic. Hattic nouns consist of a free number of syllables and have both prefixes and suffixes. There are, however, no formal distinctive features to distinguish nouns and verbs.

Both the Hurrian and the Urartian languages differentiate between stems and suffixes, but there is again no sharp distinction between noun and verb. Many suffixes may be added onto one another in a row, but within the often prolonged suffix series a detailed order is rigidly observed. Among the suffixes added to the noun, several subgroups are distinguished; one group might be compared to the case endings of the Indo-European languages. One of the most characteristic phenomena of this group is the distinction between a subject case (the "nominative") and an "agentive" case. The agentive marks the actor or subject of a transitive verb when the object is expressed by its counterpart, the "nominative." The subject case is characterized by a lack of ending on the stem; it marks the subject in nominal sentences (sentences without verbs) and occurs with intransitive verbs and as the object of transitive verbs.

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: LINGUISTIC CHARACTERISTICS: Phrygian.

Phrygian.
The New Phrygian texts especially favour the attribution of the Phrygian language to Indo-European. They contain such data as ios as relative pronoun (Indo-European *io-s, Greek hos), a demonstrative pronoun that is either comparable to Indo-European *ki-/ko- or to *so (an asterisk indicates a hypothetical reconstructed form), and the form ad-daket "he adds" related to Latin addit and to Greek é-the-ka.

Languages of the World: INDO-EUROPEAN LANGUAGES: Anatolian languages: LINGUISTIC CHARACTERISTICS: The Anatolian subgroup of Indo-European.

The Anatolian subgroup of Indo-European.
Grammatical characteristics.
Characteristic of the Anatolian languages is the absence of the dual number ("you and I") and the lack of feminine gender in the declension of nominals (nouns, pronouns, and adjectives). There is a division between an animate (common) gender and an inanimate (neuter) gender. In Hittite, a neuter may not be the subject of a transitive action verb; in that case, an -ant suffix is added before the neuter nominative ending in -s. This -s ending persists in the whole subgroup. The case system of Old Hittite is still fairly complicated, but in the subgroup as a whole there is a clear tendency toward a greater simplicity. The presence in Hittite of an archaic irregular class of nouns is a striking characteristic; e.g., there are alternate r and n stems as in uttar/uttanas "word, affair" and watar/witenas "water."

The Anatolian inflection of pronouns conforms to the traditional Indo-European pattern by being different from that of the nouns, but, at the same time, it shows some striking peculiarities. Typical Anatolian pronouns are: Hieroglyphic Luwian amu, equivalent to Lycian amu, emu, emu "I, me" (compare Hittite nominative uk, accusative ammuk); and Hittite nominative zik, accusative tuk "you," as compared to ti, tu in Palaic. Some of the languages have enclitic pronouns; i.e., pronouns pronounced as being part of the preceding word. A demonstrative pronoun aba- ("that," but in some member languages also "this") is present in Hittite, Palaic, Cuneiform and Hieroglyphic Luwian, Lycian (ebe-), and Lydian (ebad "here, there"), and an interrogative or relative pronoun kui- (compare Latin quis) is common to Hittite, Palaic, and Cuneiform Luwian. The corresponding terms for kui- in Hieroglyphic Luwian, Lycian, and Lydian also seem to be phonetic variants of the same original pronoun.

The Anatolian verbal system is simple: it has two moods (indicative and imperative) and two tenses (present and preterite). There are some traces--either to be classified as debris or as the nucleus for a future development--of an aorist -s fixed to the stem; e.g., kaness- "to acknowledge" (compare Greek gi-gno-sk-o); kalless- "to call" (compare Greek kaleo, aorist é-kale-s-a). (The aorist is a verb form denoting action without reference to its duration or completion.) A mediopassive "voice" is present in Hittite (es-a-ri "he is seated"; ki-tta-ri "he is lying"), Palaic, Luwian, and perhaps in Lydian. (The mediopassive expresses a type of reflexive meaning ["He washes himself"] or passive meaning ["He is being washed"].)

Reduplication (repetition) of the verbal stem occurs in the entire Anatolian subgroup. It adds an iterative or intensive nuance to the meaning, but it does not function in a system of tenses as in Greek. Very typical of the Anatolian subgroup are verbal suffixes like the causative -nu- (compare Hittite war- "to burn," warnu- "to kindle," harg- "to perish," harganu- "to ruin, to destroy"). In principle, these formations can be built on any verbal stem whose meaning permits such an addition. It should be stressed that in Hittite a normal expression for a "state" consists of a nominal sentence (that is, a sentence without a verb but with a noun, an adverbial expression, or a participle as predicate); sometimes, however, the verb es- "to be" is used as the carrier of modal or temporal nuances. The total absence of the Indo-European perfect (describing a "state" resulting from a recently concluded action) becomes very clear by the usage of the adverb nawi "not yet," which occurs with a present tense in Hittite (but which would employ a perfect tense, such as "has been," in English and other Indo-European tongues).

Very characteristic of the Anatolian subgroup is a strong preference for the linking together of particles and enclitic pronouns to form "chains" that are placed at the beginning of the sentence or clause. The first component of such a "chain" usually is a stressed part of the sentence or otherwise a sentence connective (like nu in Hittite, a in Luwian).

Phonological characteristics.
In the Anatolian vowel system, a, e, i, and u are present, but o is curiously absent. In Lycian, the Greek value omicron has been used for the Lycian u, but in Lydian the existing o seems to be a secondary development. A main dialectal criterion is the treatment of Anatolian e: in Old Hittite, there still was a differentiation between e and i, but in later Hittite, an -e at the end of a word changed to -i. In Luwian, e tended to appear as a. Vowel gradation (i.e., a change of vowel) that reflects meaning change plays a role in Hittite (e-es-zi "he is" versus a-sa-an-zi "they are") but was impossible in Luwian because of the sound change. Both Lycian and Lydian possess separate signs for nasalized vowels (ã and e).

Advocates of the so-called laryngeal theory (first proposed by the Swiss linguist Ferdinand de Saussure in 1879) have found their postulate partly confirmed by Anatolian data. This theory maintains that the different forms of certain words in the various Indo-European subgroups can be satisfactorily explained only by assuming that all the known Indo-European languages have lost certain guttural sounds (laryngeals) that were originally present in the parent speech. In 1927 the Polish linguist Jerzy Kurylowicz and the French scholar Albert L.M. Cuny announced their discovery that in Hittite an h sound was preserved in positions in which a laryngeal would have formerly been (compare Hittite hant- "front" to English anti-; Hittite pahhur "fire" but English pyre). But the Anatolian evidence for the laryngeal theory is certainly not without problems, and the adherents of the theory consider that other laryngeals disappeared in Hittite as well. (see also Index: laryngeal consonant)

Lexical data.
Some examples of correspondences in vocabulary are given in Table 3. It has often been remarked--and not without reason--that although the grammar of the Anatolian languages would be recognizably Indo-European, the vocabulary would be less so. This is usually attributed to the deeply penetrating influences exercised by strange surroundings, not only during the period of time when the "Anatolians" were "en route" but also after their arrival in Anatolia.

The relationship with the other subgroups.
The relationship of the Anatolian branch to the rest of Indo-European has often been defined in the United States on the basis of the "Indo-Hittite hypothesis"; that is, Hittite or Anatolian on the one hand and Proto-Indo-European on the other were both supposed to descend from a common parent. This hypothesis attributes too much weight to the Anatolian evidence. It was demonstrated as early as 1938 that the Anatolian branch should be placed on a par with the rest of the Indo-European subgroups and not as a coequal with Indo-European itself. The Indo-Hittite hypothesis is now rarely defended. Another extreme position states that the Hittite-Luwian-speaking group (another designation for the Anatolian subgroup) left the Indo-European parent group comparatively late, after the Greek and Armenian divisions had done so and approximately at the same time as Indo-Iranian. If this theory were true, there would be no need to use the Anatolian data for a thorough revision of the reconstructed Proto-Indo-European language, because these data would be less relevant, at least not more so than Indo-Iranian and Greek, on which the old reconstruction was based. A third opinion--prevalent in the French school of Indo-European studies--holds that the Hittite or, preferably, the "Common Anatolian" data are of special importance, because the Anatolian languages are particularly archaic. According to this theory, similarities in morphology (word elements) between the Celtic, Italic, and Hittite-Luwian groups and Tocharian (an Indo-European language of Central Asia) seem to imply that the dialects from which these groups evolved were in peripheral positions in the Indo-European language area and were probably the first to move away from the main group. (Ph.H.J.H.t.C.)

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages

Indo-Iranian languages

The Indo-Aryan languages and the Iranian languages together constitute the Indo-Iranian language group, the easternmost major branch of the Indo-European family of languages. Indo-Aryan (Indic) languages are spoken by some 800 million persons in India, Pakistan, Sri Lanka, Nepal, Bangladesh, and other areas of the Himalayan region. In addition, languages of the Indo-Aryan group are spoken by about 5,000,000 people in Europe, Africa, the Americas, and Oceania: the Gypsy, or Romany, dialects that are distributed about parts of Asia, the Middle East, Europe, and North America are of Indo-Aryan origin. Speakers of Iranian languages number in the tens of millions and live in areas extending from Pakistan to Iran, Afghanistan, Transcaucasia, and Central Asia. Among the Indo-European languages, only Linear A and Linear B and Hittite possess records that go back farther in time than those of Indo-Iranian.

The Indo-Iranian tongues have been used as both administrative and literary languages. Old Persian was the administrative language of the early Achaemenian dynasty dating from the 6th century BC; and an eastern Middle Indo-Aryan dialect was the language of the chancellery of the Mauryan emperor Ashoka in India in the mid-3rd century BC. As literary languages, the Indo-Iranian languages were used in the texts of some of the world's great religions: Indo-Aryan for Buddhism, Hinduism, and Jainism, and Iranian for Zoroastrian and Manichaean texts. The oldest Zoroastrian texts are in dialects included under the name Avestan. Commerce, conquest, and religion spread the influence of these languages. Indo-Aryan languages, for example, penetrated deep into Southeast Asia; names in Indonesia and other areas and Sanskrit texts in Cambodia reflect this influence.

The close relation between the Iranian and Indo-Aryan groups has never been doubted. They share characteristic features that set them apart as a subgroup of Indo-European. The long and short varieties of the Indo-European vowels e, o, and a, for example, appear as long and short a: Sanskrit manas- "mind, spirit," Avestan manah-, but Greek ménos "ardour, force." (In the following examples, a macron ({macron}) indicates a long vowel; a breve ({breve}) indicates a short vowel. The spellings used in this article for Indo-Aryan and Iranian forms are traditional transliterations for the most part. In some cases, more accurate phonetic symbols are used. These can be found in the International Phonetic Alphabet.) In instances in which some Indo-European languages have an a sound, Indo-Iranian has i as a reflex of Indo-European sounds called laryngeals--e.g., Greek pater "father," Sanskrit pitr-, Avestan and Old Persian pitar-. After stems ending in long or short a, i, or u, an n occurs sometimes before the genitive (possessive) plural ending am (Avestan -am)--e.g., Sanskrit martyanam "of mortals, men" (from martya-); Avestan masyanam (from masya-); Old Persian martiyanam. (see also Index: Sanskrit language, Avestan language, Persian language)

In addition to several other similarities in their grammatical systems, Indo-Aryan and Iranian have vocabulary items in common--e.g., such religious terms as Sanskrit yajña-, Avestan yasna- "sacrifice"; and Sanskrit hotr-, Avestan zaotar- "a certain priest"; as well as names of divinities and mythological persons, such as Sanskrit mitra-, Avestan mi{theta} ra- "Mithra." Indeed, speakers of both language subgroups used the same word to refer to themselves as a people: Sanskrit arya-, Avestan airya-, Old Persian ariya- "Aryan." (see also Index: sound change)

The Indo-Aryan and Iranian language subgroups also differ from each other in a number of linguistic features, among them that Indo-Aryan has an i sound representing an Indo-European laryngeal sound not only in initial syllables but generally also in interior syllables; e.g., Sanskrit duhitr- "daughter" (cf. Greek thugáter). In Iranian, however, the sound is lost in this position; e.g., Avestan dug{schwa}dar-, du{voiced velar fricative con.}dar-. Similarly, the word for "deep" is Sanskrit gabhira- (with i for i), but Avestan jafra-. Iranian also lost the accompanying aspiration (a puff of breath, written as h) that is retained in certain Indo-Aryan consonants; e.g., Sanskrit dha "set, make," bhr, "bear," gharma- "warm," but Avestan and Old Persian da, bar, and Avestan gar{schwa}ma-. Further, Iranian changed stops such as p before consonants and r and v to spirants such as f: Sanskrit pra "forth," Avestan fra; Old Persian fra; Sanskrit putra- "son," Avestan pu{theta} ra-, Old Persian pussa- (ss represents a sound that is also transliterated as ç). In addition, h replaced s in Iranian except before non-nasal stops (produced by releasing the breath through the mouth) and after i, u, r, k; e.g., Avestan hapta- "seven," Sanskrit sapta-; Avestan haurva- "every, all, whole," Sanskrit sarva-. Iranian also has both xs and s sounds, resulting from different Indo-European k sounds followed by s-like sounds, but Indo-Aryan has only ks; e.g., Avestan xsayeiti "has power, is capable," saeiti "dwells," but Sanskrit ksayati, kseti. Iranian was also relatively conservative in retaining diphthongs that were changed to simple vowels in Indo-Aryan.

Iranian differs from Indo-Aryan in grammatical features as well. The dative singular of -a-stems ends in -ai in Iranian; e.g., Avestan masyai, Old Persian cartanaiy "to do" (an original dative singular form functioning as infinitive of the verb). In Sanskrit the ending is extended with a--martyay-a. Avestan also retains the archaic pronoun forms yus, yuz{schwa}m "you" (nominative plural); in Indo-Aryan the -s- was replaced by y (yuyam) on the model of the 1st person plural--vayam "we" (Avestan vaem, Old Persian vayam). Finally, Iranian has a 3rd person pronoun di (accusative dim) that has no counterpart in Indo-Aryan but has one in Baltic.

The original location of the Indo-Iranian group was probably to the north of modern Afghanistan, in the present-day states of Tajikistan, Uzbekistan, Kyrgyzstan, Turkmenistan, and Kazakstan, where Iranian languages are still spoken. From there, some Iranians migrated to the south and west, the Indo-Aryans to the south and east. From geographical references in the earliest Indo-Aryan literary document, the Rigveda, it is clear that the earliest settlement of Indo-Aryans was in the northwest of the Indian subcontinent. Migration did not take place at once; there was doubtless a series of migrations. The date of entry of the Indo-Aryans into the subcontinent cannot be precisely determined, though the beginning of the 2nd millennium BC is plausible and generally accepted. (see also Index: Veda)

There is heated controversy concerning the precise linguistic position of the language of the Indo-Iranian family first attested in Middle Eastern cuneiform texts of c. 1450-1350 BC. Some borrowed words and proper names appearing in these Hittite-Hurrian documents have been interpreted as belonging either to Indo-Iranian, to an Indic subgroup of Indo-Iranian that had not yet fully split, or to Indo-Aryan proper. Complete scholarly agreement on this issue has not been reached.

The identification of the Harappan peoples of the Indus Valley, whose writing has not yet been satisfactorily deciphered, also awaits further research; with it may come a possible answer as to whether Indo-Aryans encountered these people or whether their civilization had passed by the time the Indo-Aryans arrived on the subcontinent. Whatever the answers to these problems may be, the reasons for the split of the Indo-Aryans and Iranians are not known.

In the following presentation regarding Indo-Aryan documents as evidence for linguistic history, it should be borne in mind that almost all dates are approximations.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES

THE INDO-ARYAN LANGUAGES

Languages of the group.
Indo-Aryan languages are assigned to three major periods: Old, Middle, and New Indo-Aryan. These periods are linguistic, not strictly chronological. Old Indo-Aryan includes different dialects and linguistic states referred to in common as Sanskrit. The most archaic Old Indo-Aryan is that of sacred texts called Vedas. Classical Sanskrit is the name given to the literary language that represents a polished form of various dialects. The late Vedic dialect described by the grammarian Panini (c. 6th century BC) is also commonly called Classical Sanskrit. Middle Indo-Aryan includes both the dialects of inscriptions from the 3rd century BC to the 4th century AD and literary languages. Apabhramsha dialects represent the latest stage of Middle Indo-Aryan development. Though all Middle Indo-Aryan languages are included under the name Prakrit, it is customary to speak of the Prakrits as excluding Apabhramsha.

New Indo-Aryan is represented by such modern vernaculars as Hindi and Bengali, which began to emerge from about the 10th century AD. These too have earlier and later stages, culminating in the present-day languages.

New Indo-Aryan languages accounted for about 490,000,000 speakers in India, or approximately 74 percent of the population in the early 1980s. Considering the approximately 85,000,000 Bengali speakers in Bangladesh, approximately 63,000,000 speakers accounted for by Punjabi and Sindhi in Pakistan, and 11,000,000 Sinhalese (Sinhala) speakers in Sri Lanka (formerly Ceylon), the total number of New Indo-Aryan speakers is well over 650,000,000. According to the latest Indian census, there are 547 mother tongues of the Indo-Aryan group in use within the bounds of postpartition (1947) India. Some of these are dialects that are used by few speakers; others are official state languages having 30,000,000 or 50,000,000 speakers. The major groups of New Indo-Aryan languages are given in Table 4. Structurally and historically, Hindi and Urdu are one, although they are now official languages of different countries written in different alphabets. The term hindi (also hindvi) is known from as early as the 13th century. The term zaban-e-urdu "language of the imperial camp" came into use in about the 17th century. In the south, Urdu was used by Muslim conquerors of the 14th century.

Many of the languages in Table 4 are official state languages, the media of education up to the university level and of official transactions. Hindi, written in the Devanagari script, is the co-official language (with English) of the Republic of India and is used as a lingua franca throughout North India. It has varieties according to the mother tongue of the area; e.g., Bombay Hindi and Calcutta Hindi. Each of the major state languages has several other dialects in addition to the standard dialect adopted for official purposes. Including the various dialects down to the village level, it can be said that a chain of communication stretches across North India such that each dialect forms a link with each adjacent dialect. On the level of official languages this is not so: a Gujarati speaker will not readily understand colloquial Bengali.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Historical survey of the Indo-Aryan languages.

Historical survey of the Indo-Aryan languages.
The points noted above regarding Indo-Aryan migration make it difficult to determine the domain of Proto-Indo-Aryan, the ancestral language of all the known Indo-Aryan tongues, if indeed there was any such single region. All that can be said with certainty is that the Indo-Aryans on the subcontinent first occupied the area comprising most of present-day Punjab (both West and East), Haryana, and the Upper Doab (Ganges-Yamuna interfluve) of Uttar Pradesh. The structure of Proto-Indo-Aryan must have been close to that of early Vedic, with dialectal variations.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Historical survey of the Indo-Aryan languages.: Old Indo-Aryan.

Old Indo-Aryan.
The most archaic Sanskrit is that of the Vedas, of which there are four major text groups called Samhitas: the Rigveda, Atharvaveda, Samaveda, and Yajurveda. The Yajurveda is in turn divided into two main branches, the White (Shukla) Yajurveda and the Black (Krishna) Yajurveda. The Rigveda, Atharvaveda, and Samaveda are purely metrical texts mainly used by priests in their ritual. The texts of the Black Yajurveda contain both verses used in ritual sacrifice (called mantras) and prose sections that are explanatory in nature, giving mythological explanations of sacrifices and objects used in them, together with etymologies (derivations of words). These sections are known as Brahmana portions. Each Veda also has a particular Brahmana connected with it. The early Vedic texts are pre-Buddhistic; a plausible date accepted for the composition of the Rigveda is between 1200 and 1000 BC, though the exact chronology of these early texts is difficult to establish. The prose passages of Brahmanas and of the early sutra (aphoristic texts) period may be called late Vedic. Also of the late Vedic period is the grammarian Panini, author of a treatise called Astadhyayi, who makes a distinction between the language of sacred texts (chandas) and the usual language of communication (bhasa).

Epic Sanskrit is so called because it is represented principally in the two epics, Mahabharata and Ramayana. In the latter the term samskrta "formed, polished" is encountered, probably for the first time with reference to the language. The date of composition for the core of early Epic Sanskrit is considered to be in the centuries just preceding the Christian era.

Classical Sanskrit is the language of the major poetic works (kavya), drama (nataka), tales such as the Hitopadesha and Pañca-tantra, and technical treatises on grammar, philosophy, and ritual. It was used not only by the poet Kalidasa and his predecessors Bhasa, a dramatist, and Ashvaghosa, a Buddhist author, in the first centuries AD but was also continued long after Sanskrit was a commonly used mother tongue; indeed, Sanskrit is a language of learned treatises and commentaries to this day. It is also used as a lingua franca among pandits (Brahmin scholars) from different areas of India.

Linguistic developments can be traced from the early Vedic of the Rigveda through the later Samhitas on to the late Vedic of Brahmana prose and sutras, culminating in the language described by Panini, which is tantamount to Classical Sanskrit. For example, the nominative plural form ending in -asas (devasas "gods") was already less frequent than -as in the Rigveda and continued to lose ground later; in Brahmana, -as (e.g., devas) is the normal form. There are numerous other changes evident. For example, the instrumental singular form of -a-stems ends both in -a and -ena (a pronoun ending) in the Rigveda, with the latter form predominating; thus, virya "heroic might" appears once, and viryena occurs ten times (from virya- "heroic might, act"). In later Vedic -ena is the usual ending. All the early Vedic forms are expressly classed as belonging to the sacred language (chandas) by Panini.

The verb also shows chronological differences. For example, the 1st person plural ending -masi (e.g., bharamasi "we bear") predominates over -mas in Rigvedic but not in the Atharvaveda; -mas becomes the normal ending later. Early Vedic distinguishes between the aorist, imperfect, and perfect tenses. The aorist is commonly used to refer to an action that has recently taken place; the imperfect is a narrative tense referring to actions accomplished in the distant past. The perfect form of the verb originally denoted, as in Greek, a state reached; e.g., bi-bhay-a "is afraid" (root bhi). From earliest Vedic, however, this was not always the use of the perfect. Although the grammarian Panini distinguished between the three tenses noted (he said the perfect is used to denote an action beyond one's ken), the perfect and imperfect both came to be used as narrative tenses.

There are also future forms of Vedic, formed with suffixes (-isya and -sya) and used from earliest times. A future form, composed of an agent noun of the type kartr- "doer" and followed, except in the 3rd person, by forms of the verb as "be" (e.g., kartasmi [karta asmi] "I will do"), was recognized as in common use by Panini but is rare in early Vedic.

Early Vedic had a category that went out of use by the late Vedic period of Brahmanas--the injunctive, which was formally a form with secondary endings lacking the augment, a prefixed vowel. The injunctive could be used to denote a general truth. A general truth can also be signified by the subjunctive, which is characterized by the vowel a affixed to the present, aorist, or perfect stem. Later Vedic retained the injunctive only in negative commands of the type ma vadhis "do not slay." The subjunctive also diminished slowly until it was no longer used; for Panini the subjunctive belonged to sacred literature. The functions of the subjunctive were taken over by the form called optative (and the future form).

Noun forms incorporated into the verb system are numerous in early Vedic. Rigvedic has forms with affixes ya and tva functioning as future passive participles (gerundives); e.g., vac-ya- "to be said," kar-tva- "to be performed, done." The Atharvaveda has, additionally, forms with -(i)tavya (hims-itavya- "to be injured") and -aniya (upa-jiv-aniya- "to be subsisted upon"). By late Vedic, the type with tva had been eliminated; Panini recognized as normal the types karya-, kartavya-, karaniya- "to be done." In Indo-Aryan, from earliest Vedic down to New Indo-Aryan, forms called absolutives (or gerunds) are used to denote the previous of two or more actions performed (usually) by one agent: "having done . . . he did"; for example, piba nisadya "sit down (nisadya "having sat down") and drink." Rigvedic uses tvi, tva, tvaya, (t)ya to form absolutives, but these were later reduced to two: tva with a simple verb or one compounded with the negative particle, and ya with a verb compounded with a preverb (a preposition-like form).

Early Vedic also uses various case forms of action nouns in the capacity of infinitives; e.g., dative singular -tave (da-tave "to give"), genitive singular -tos (da-tos), both from a noun in -tu, which also supplies the accusative ending -tum (da-tum). There are other types in early Vedic, but the nouns in -tu are important; in late Vedic the accusative -tum and the genitive -tos (construed with ish or shak "be able, can") became the norm. According to Panini, forms in -tum and dative singular forms of action nouns are equivalent variants: bhok-tum gacchati/ bhojanaya gacchati "He is going out to eat."

That some forms fell into disuse in the course of Indo-Aryan is natural; the above represent both chronological and dialectal modifications. Such change was recognized by Indian grammarians; e.g., Patañjali, of the mid-2nd century BC, noted that perfect forms of the type ca-kr-a "you did, have done" (2nd person plural) were not in use at his time; instead, a nominal (adjective) form kr-ta-vant-as was used, consisting of the past passive participle kr-ta- and an adjectival suffix -vant. Indian grammarians also recognized the existence of different dialects. Panini noted forms used by northerners (udicya) and easterners (pracya), as well as various dialectal uses described by grammarians who preceded him.

Earlier documents also afford evidence for dialect variation; e.g., the early Vedic of the Rigveda is a dialect in which the Indo-European l sound was for the most part replaced by r--pra "fill," pur-na- "full." This change accords with Iranian; e.g., Avestan p{schwa} r{schwa}na "full." These forms contrast with Latin plenus and Gothic fulls, with l. Other dialects kept l and r distinct. There are also doublets that have both r and l in words with Indo-European r: rohita-/lohita- "red." The variant with l can be assumed to belong to an eastern dialect. This variance accords with Middle Indo-Aryan evidence and the fact that such l forms become more numerous in the tenth book (mandala) of the Rigveda, which is demonstrably more recent than the most ancient parts of the Rigveda and dates from a time when the Indo-Aryans had progressed farther east than their original location on the subcontinent. The development of retroflex l- and lh- sounds (produced by curling the tip of the tongue upward toward the hard palate) from the retroflex sounds of d (nila- "nest" from nida-) and dh when occurring between vowels is another feature characteristic of some dialects, including the major dialect of the Rigveda. (see also Index: Veda)

Classical Sanskrit represents a development of one or more such early Old Indo-Aryan dialects. At this state, the archaisms noted above have been eliminated. Moreover, the accentual system of Classical Sanskrit is not the same as that of Vedic, which had a system of pitches; vowels had low, high, or circumflex (first rising, then falling) pitch, and the particular vowel of a word that received high pitch could not be predicted. In Classical Sanskrit, on the other hand, the accent was probably predictable. If the next to the last vowel was long, it received the accent; if not, the vowel preceding it was accented. The Vedic system survived at least to the time of Panini, who described it fully and did not restrict it to sacred language. (see also Index: stress)

For all this simplification, Classical Sanskrit is considerably more complex than Middle Indo-Aryan. In addition to the vowels a, i, and u (in both long and short varieties), it has r and l used as vowels. Consonant clusters occur freely, except in word final position, and the system of sound modification conditioned by the context, called sandhi, is fully operative. Moreover, in its grammatical system Classical Sanskrit maintains the dual number, seven cases in addition to the vocative form (which marks the one addressed), and a complex set of alternations. For example, to the nominative singular form agni-s "fire," correspond the genitive singular agne-s "of fire" the nominative plural agnay-as "fires," and the instrumental plural agni-bhis "with fires," with differing vowels in the second syllable. There are also separate sets of nominal (noun) and pronominal (pronoun) endings. Some nouns and adjectives inflect as pronouns; e.g., ekasmai, dative singular masculine-neuter of eka- "one."

The verb system of Classical Sanskrit also maintains complex alternations. In the present tense of the type bhav-a-ti "becomes, is," the stem (bhav-a-) remains unchanged throughout the paradigm except for lengthening of the -a- to -a- before v and m. But other verbs have vowel alternation; e.g., as-mi "I am," s-mas "we are"; e-mi "I go," i-mas "we go"; juhomi "I pour," juhumas "we pour." A distinction is observed between active and mediopassive endings: jan-ay-a-ti "engenders" with the active ending -ti, but ja-ya-te "is born" with the mediopassive ending -te. (Mediopassive verb forms are used for the passive, reflexive, and other meanings.)

Classical Sanskrit also has a rich system of nominal and verbal derivatives. Compound words are of the following kinds: copulative (dvandva) compounds such as matapitarau "mother and father" (also elliptic pitarau "parents"); the type like tat-purusa- "his man," in which the first member is equivalent to a case other than nominative; the type like bahu-vrihi "much-rice," in which the object denoted is other than that of any of the members of the compound (bahur vrihir yasya "He who has much rice"); and adverbial compounds (avyayibhava) of the type upagni (upa-agni) "near the fire." In addition, there are derivatives with affixes -tara- and -tama, such as priya-tara- "very dear" and priya-tama- "most dear" from the adjective priya-. Pronouns have derivatives equivalent to case forms; e.g., tatra "there," yatra "where," and kutra "where?" are equivalent to locative forms such as tasmin, yasmin, and kasmin. These can also be used without a noun.

Among the derivative verbal systems are the causative and the desiderative ("desire to"); the former has an affix -ay- (gam-ay-a-ti "makes to go," kar-ay-a-ti "has do") or, after roots in -a, -pay- (stha-pay-a-ti "sets in place"). The desiderative is formed with -sa- and reduplication (repetition of a part of the root)--di-drk-sa-te "desires to see" (root drsh). The desiderative also has an agent noun in -u--di-drk-s-u "who wishes to see."

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Historical survey of the Indo-Aryan languages.: Middle Indo-Aryan.

Middle Indo-Aryan.
The Sanskrit word prakrta, whence the term Prakrit, is a derivative from prakrti- "original, nature." Grammarians of the Prakrits generally consider the original from which they derive to be the Sanskrit language as described by grammarians going back to Panini. Most modern scholars consider prakrta to refer to the "natural" languages, the vernaculars, as opposed to Sanskrit, the polished language of literature and the educated (shista). There is also linguistic evidence to support this view. Several forms in the Prakrits are found in Vedic but not in Classical Sanskrit. As Classical Sanskrit is not directly derivable from any single Vedic dialect, so the Prakrits cannot be said to derive directly from Classical Sanskrit. (see also Index: Prakrit languages)

The most archaic literary Prakrit is Pali, the language of the Buddhist canon (c. 5th century BC) and of the later stories and commentaries of Theravada Buddhism. Pali represents essentially a western Middle Indo-Aryan dialect, though there are sufficient easternisms in the canon to have led some scholars to the view that the canon as it exists today is a recast of an original in an eastern dialect. To the Buddhist literature also belongs the Gandhari Dhammapada, the only literary text written in a dialect of the northwest. The Niya documents, official documents written in Prakrit dating from the 3rd century AD, also belong to the northwest. The earliest inscriptional Middle Indo-Aryan is that of the Ashokan inscriptions (3rd century BC). These are more or less full translations from original edicts issued in the language of the east (from the capital Pataliputra in Magadha, modern Patna in Bihar) into the languages of the areas of Ashoka's kingdom. There are other Prakrit inscriptions up to the 4th century AD, and Sanskrit was not used inscriptionally until the first centuries AD. Literary Prakrits other than Pali were also used in independent works and in dramas along with Sanskrit. (see also Index: Pali language)

According to Prakrit grammarians, Maharastri ("From the Maharashtra Country") is the Prakrit par excellence. It is the language of kavyas (epic poems) such as the Ravanavaha (also called Setubandha) from no later than the 6th century AD. Maharastri is also the language of lyrics in Rajashekhara's Karpura-mañjari (c. 900), the only extant drama written completely in Prakrit, and of verses recited by women in the classical drama of Kalidasa and his successors, though not earlier. The literary dialect used for conversation among higher personages other than the king and his captains in the drama is Shauraseni, while Magadhi is used by lower personages.

The language of the early Jaina canon, the final version of which was made in the 5th or 6th century AD, is called Ardhamagadhi ("Half Magadhi"); Jaina also used another literary dialect, called Jaina Maharastri in non-canonical works. The oldest poetic work in this is Vimala Suri's Paumacariya (c. 3rd century). Of other Prakrit dialects mentioned by grammarians, Paishaci (or Bhuta-Bhasa, both meaning "Language of Demons") is noteworthy; it is said to be the language of the original Brhatkatha of Gunadhya, source of the Sanskrit book of stories Katha-saritsagara.

Buddhist works were also written using a language that has been called Buddhist Hybrid Sanskrit. Among these works is the Mahavastu, the core of which is thought to date from the 2nd century BC. This language is a Middle Indo-Aryan dialect of indeterminate origin, which steadily became more Sanskritized in prose sections of later works.

The most advanced stage of Middle Indo-Aryan, Apabhramsha, was also used as a literary language. That there was literary creation in Apabhramsha by the 6th century is clear from an inscription of King Dharasena II of Valabhi, in which the King praises his father as being adept in Sanskrit, Prakrit, and Apabhramsha composition. Moreover, in the fourth act of Kalidasa's drama Vikramorvashiya there are Apabhramsha verses. Because Kalidasa probably lived in the 3rd or 4th century, literary composition in Apabhramsha is earlier still, if these verses are legitimate. There is a great deal of later literature in Apabhramsha, for the most part Jaina works; e.g., Paumacariu of Svayambhu (8th-9th century), Harivamsha-purana of Puspadanta (10th century), Sanatkumara-cariu of Haribhadra (12th century). (see also Index: Apabhramsá language)

Middle Indo-Aryan is characterized generally by the reduction of the complexities seen in Old Indo-Aryan. The vowel system was reduced by the merger of r (and l) sounds with vowels and the change of the diphthongs ai and au to the vowel sounds e and o; e.g., Pali accha- "bear" (Sanskrit rksa-), ina- "debt" (Sanskrit rna-), uju- "straight" (Sanskrit rju-), pucchati "asks" (Sanskrit prcchati), metti- "friendship" (Sanskrit maitri-), orasa- "breast-born, legitimate" (Sanskrit aurasa-). Moreover, -aya- and -ava- commonly contracted to -e- and -o-; e.g., Pali jeti "conquers" (Sanskrit jayati), odhi- "limit" (Sanskrit avadhi-). Final consonants were deleted, with the exception of -m, which developed to an -m sound before which a vowel was shortened (Pali bhariyam "wife"; Sanskrit bharyam). Together with the trend toward replacing variable consonant stems by unchanging stems in -a-, this change had serious consequences for the grammar. Consonant stems steadily disappeared and were transformed to stems ending in a vowel; e.g., to Sanskrit sharad- "autumn," sarit- "stream," and sarpis- "butter" correspond the Pali forms sarada-, sarita, and sappi-. Consonant clusters were also modified in Middle Indo-Aryan; e.g., Pali khetta- "field" (corresponding to Sanskrit ksetra-), Pali dakkhina- "right, south" (Sanskrit daksina), aggi- "fire" (Sanskrit agni-), punna- "full" (Sanskrit purna), and tanha- "thrist" (Sanskrit trsna-). The shortening of vowels before modified consonant clusters led to the use of short e and o sounds, which were unknown in Old Indo-Aryan; e.g., Pali semha- "phlegm" (Sanskrit shlesman), ottha- "lip" (Sanskrit ostha-).

The above phenomena are not restricted to Pali; they are pan-Middle Indo-Aryan. Differences between Pali and Ashokan and other Prakrits include the retention of voiceless stops (i.e., p, t, k) between vowels in Pali and Ashokan dialects; other Middle Indo-Aryan dialects modify them. The extreme development appears in literary Maharastri, in which unaspirated stops (pronounced without an accompanying audible release, or pull of breath) other than retroflexes (t, d) and labials (p, b) were deleted, aspirated stops (pronounced with an audible puff of breath) were replaced by h, retroflexes (pronounced by curling the tongue upward toward the hard palate) became voiced, and labials were replaced by v; e.g., loa- "world" (Sanskrit loka-), loana- "eye" (Sanskrit locana-), saha- "branch" (Sanskrit shakha-), padhai "recites, reads" (Sanskrit pathati), and savaha- "curse" (Sanskrit shapatha-).

Essentially on the same level are the dialects of Jaina texts, but in these a y glide prescribed by grammarians occurs when a consonant is elided: vayana- "face" (Sanskrit vadana-); sayala- "whole" (Sanskrit sakala-). In Shauraseni, on the other hand, voiceless stops (e.g., p, t, k) between vowels are voiced (e.g., become b, d, g, respectively); e.g., ido "hence" (Sanskrit itah); tadha "thus" (Sanskrit tatha). Though Pali and Ashokan are at an earlier level of development with respect to these changes, they share with the rest of the Middle Indo-Aryan dialects the replacement of voiced aspirated sounds between vowels by h: lahu- "light, unimportant" from laghu-; dahati "gives" (Sanskrit dadhati). Similarly, they share the change of dy- to j: joti- "light, brilliance" (Pali jotati "shines," Sanskrit dyotate). Pali and Ashokan, however, retain a y sound, changed to j in most other Prakrits; e.g., the pronoun ya- (feminine ya-), as in Sanskrit, opposed to ja-.

The deletion of stop consonants noted above resulted in vowel sequences within words that were unknown to Old Indo-Aryan. Similarly, the extent of sandhi modification was restricted in Middle Indo-Aryan. The Middle Indo-Aryan vowels i and u do not change to y and v before dissimilar vowels in compounds; e.g., Maharastri rattiandhaa- "dark of night" (Sanskrit ratry-andhaka-). In addition, the first of two contiguous vowels in different words is subject to deletion; e.g., Pali manas'icchasi (from manasa icchasi) "you wish in your mind."

In its grammatical system, Middle Indo-Aryan also reduced complexities. The dual number no longer exists as a separate category; for Sanskrit dvabhyam "by two," Middle Indo-Aryan has dohi(m), with the ending -hi(m) equivalent to the instrumental plural -bhis of Old Indo-Aryan. Among other changes is the replacement of the dative case by the genitive except in particular usages; e.g., the use of forms corresponding to the Old Indo-Aryan dative to denote a purpose.

In Middle Indo-Aryan, nominal and pronominal forms are no longer strictly segregated; e.g., Ashokan vijitamhi "in the kingdom" (also vijite) has a pronominal ending equivalent to Sanskrit -smin.

In the verb system, the contrast between active (-ti) and mediopassive (-te) endings was obliterated. Further, the Old Indo-Aryan distinction between aorist, imperfect, and perfect forms was eliminated. With few exceptions, the sigmatic aorist (an aorist form with s) provides the only productive preterite of early Middle Indo-Aryan: Ashokan ni-kkhamisu "they set out" (Sanskrit nir-a-kramisur). In later Prakrits verbally inflected preterites were generally eliminated; in their place was used the past participle. For example, in Shauraseni devi uva-visa, maharao vi a-ado "Sit down, my queen, the king also has arrived," the past participle a-ado (Sanskrit a-gatah) agrees with maha-rao "king" (Sanskrit maha-rajah) in number and gender. If the verb is transitive, the participle agrees with the direct object, and the agent is denoted by an instrumental form: in Jaina Maharastri, tena vi savvam sittham "He has told everything," tena "by him" denotes the agent, and sittham "told" (Sanskrit shistam) agrees with the neuter singular form savvam (Sanskrit sarvam). When no object is denoted, the verb is in the neuter singular. Old Indo-Aryan used both the participial construction and the finite verb; thus to Prakrit so vi tena samam gao "He also went with him" could correspond Sanskrit so'pi tena saha gatah or so'pi tena sahagamat (saha agamat). The Middle Indo-Aryan development eliminated the latter.

Alternations of the Sanskrit type as-mi, s-mas were eliminated in Middle Indo-Aryan; the predominant type of present tense was formed from an unchanging vowel stem (Pali e-ti, e-nti "go[es]").

Nominal forms of the verb system are of the same types as Old Indo-Aryan; e.g., the Pali future passive participle katabba- (Sanskrit kartavya-) "to be done," Shauraseni karania; Ardhamagadhi, Jaina Maharastri, and Maharastri karanijja- "to be done." The infinitive is commonly formed on the present tense stem, not on the root, as in Old Indo-Aryan. Thus Pali pappotum is formed on the present pappoti; Sanskrit praptum is formed on the root prap, present tense prapnoti.

Middle Indo-Aryan shows evidence of dialectal differentiation. The earliest documents that allow one to determine roughly the dialect distribution are Ashoka's inscriptions. These represent three major dialect areas: east, as in the inscriptions of Jaugada, Dhauli, and Kalsi; west, in Girnar; and northwest, in Mansehra and Shahbazgarhi. Characteristic of the east dialect area is final -e, corresponding to -o in the west and -as in Sanskrit; in the east dialect area l also regularly corresponds to r of the west and of Sanskrit. Moreover, in the east dialect area there is a tendency to insert a vowel within consonant clusters, while in the west and northwest one of the consonants is assimilated to the other without an intervening vowel. For example, to Sanskrit rajñas "of the king" corresponds Girnar rañño, Shahbazgarhi raño, Jaugada lajine. Northwest stands apart in retaining three spirant sounds, sh, s, s, which merge to s elsewhere. Ashoka's eastern dialect, from the Magadha country, shows an s sound for Old Indo-Aryan sh, s, s, rather than the sh sound typical of literary Magadhi. Grammatical features also show dialectal variation; e.g., the Ashokan dative singular form is -aya in the western dialects (Girnar atthaya "for the purpose of") but -aye in the east (Kalsi, Dhauli atthaye).

As noted above, the most advanced development of Middle Indo-Aryan is seen in Apabhramsha. Sound changes that are typical of Apabhramsha include the replacement of the vowel sound a by u in final syllables; e.g., karahu "you do, make," corresponding to karaha (karadha) in other Prakrits. From stems in -aya- develop forms in -au and nasalized -au (nasalization is here indicated by a tilde): bhadarau "honored one, king" (Prakrit bhattarayo), hau "I" (Ashokan hakam). Nasalization also appears in environments in which earlier m occurred between vowels; e.g., gau "village" (from gama, Sanskrit grama). Numerous other sound changes are evident, among them the development of -s(s)- between vowels into h: taho "of him" (from Prakrit tassa, Sanskrit tasya); hohinti "will be" (compare Pali hossati). Apabhramsha contractions, such as -aya- changing to -a and -iya to -i, foreshadow New Indo-Aryan, in which the development was extended; e.g., Apabhramsha paniu "water" (Old Indo-Aryan paniyam), Gujarati pani, Hindi pani. (see also Index: Gujarati language)

In other points Apabhramsha also presaged New Indo-Aryan. The interest of Apabhramsha lies in the fact that contracted forms presage the New Indo-Aryan opposition of masculine, neuter, and feminine nouns; thus, Apabhramsha -au, -au, -i, Gujarati -o, -u, -i (gayo, gayu, gai "went"), Hindi -a, -i (gaya, gai). The case system of Apabhramsha is also at a more advanced level of disintegration than that of earlier Middle Indo-Aryan, with the instrumental and locative plurals being identical in form (-ahi or -ehi for -a-stems) and instrumental singular forms also being used as locatives.

In the Apabhramsha verb system, present tense stems in -a predominate. Apabhramsha verb endings differ from those of other Prakrits. Most interesting is the 3rd person plural type kara-hi "they do," which coexists with karanti. The form kara-hi, corresponding to the 3rd person singular kara-i "he does," is formed on the model of the pair kara-u (1st person singular, "I do") and kara-hu (1st person plural, "we do"). Here again Apabhramsha comes close to New Indo-Aryan. Moreover, Apabhramsha has some causative formations that do not occur elsewhere in Middle Indo-Aryan but are known from New Indo-Aryan--bham-ada-i "causes to turn," Gujarati bhamare che "causes to turn round," and pais-ara-i "causes to enter," Gujarati p{half-open front} sare che "causes to enter, to penetrate."

Also noteworthy are two syntactic usages that closely parallel those present in New Indo-Aryan. The present participle is used as a conditional; e.g., jai hau mi tena sahu tau karantu to kim asamahie sahu marantu "Even if I had performed (karantu) ascetic acts with him, would I have died without mental concentration?" in which the participles karantu and marantu have the value of conditionals. In Sanskrit the conditionals a-kar-isya-m and a-mar-isya-m are used; but in speaking Gujarati a person would say jo hu . . . karat . . . to marat, and Hindi would have the forms karta . . . marta. The Apabhramsha gerundive in -iv(v)a or -ev(v)a can be used as an infinitive; e.g., pi-evae lagga "began to drink." This is the Gujarati construction pi-va lagyo "began to drink," in which pi-va is an inflected form of pi-vu, that is, a verbal noun (infinitive) corresponding etymologically to the Apabhramsha gerundive.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Historical survey of the Indo-Aryan languages.: Influences on Middle Indo-Aryan.

Influences on Middle Indo-Aryan.
In the mid-2nd century BC, the grammarian Patañjali explained that to speak faultlessly the language now called Sanskrit (as described by Panini) one should imitate the correct speakers (called shista "learned, educated") of Aryavarta ("Country of the Aryans"). Earlier, the grammarian Katyayana (c. 3rd-4th century BC) had noted that Panini gave lists of verb roots in order that certain Middle Indo-Aryan forms not be accepted as having been correctly derived from a Sanskrit verb root. Moreover, Patañjali noted that one should study grammar in order to learn not to use incorrect words such as helayah instead of herayah (a phrase used in calling to people) or gavi instead of gauh "cow"; gavi is a Middle Indo-Aryan word. The observations of these grammarians are considered to lend support to the view that by the 6th or 5th century BC Sanskrit as a medium of learned conversation coexisted with Middle Indo-Aryan. Further, the Pali canon records that the Buddha enjoined his followers to use the vernaculars in communicating his teachings, and the Jaina canon identifies Ardhamagadhi as the language to be employed for communicating the teachings of Mahavira. Similarly, Ashoka used Middle Indo-Aryan, not Sanskrit, in the inscriptions he ordered written throughout his kingdom; Sanskrit does not appear on inscriptions until the early centuries AD (e.g., Rudravarman's inscription at Junagarh, c. AD 150). The coexistence of Old Indo-Aryan and Middle Indo-Aryan is to be accepted even for the time when the earliest Old Indo-Aryan texts were put to writing.

Middle Indo-Aryan shows similar evidence of the influence of linguistically more advanced vernaculars on literary compositions. The Prakrits of elegant literary compositions must have been artificial, different in many respects from the vernaculars current at the time, though reflecting languages that were current at some former time. The Old Indo-Aryan and Middle Indo-Aryan stages, then, present a picture of concurrent vernaculars with dialects and literary languages influenced by the vernaculars; it is impossible to compartmentalize the different stages as beginning and ending at any definite date.

The literary languages borrowed words and suffixes from earlier languages. There are Prakritisms (i.e., forms of earlier Prakrits) in Apabhramsha; e.g., the genitive singular ending -ssa instead of -ho and 2nd person plural verb forms terminating in -ha instead of -hu. All the literary Prakrits had recourse to Sanskrit as a source for borrowing words. Words that were incorporated into the Prakrits from Sanskrit with no change in form are called samskrta-sama "identical with Sanskrit" (or tat-sama "identical with that") and are contrasted with words termed samskrta-bhava (tad-bhava) "whose origin is in Sanskrit"--that is, words that the grammarians can derive from Sanskrit by using certain rules. Another class of words, called deshya (or deshi) "belonging to the area, country," includes items that the grammarians cannot derive easily from Sanskrit and that are supposed to have been in use in particular areas from early times.

Many or most of the deshya words are indeed derivable from Sanskrit, but some are of Dravidian origin; e.g., akka "sister" (Telugu akka), atta "father's sister" (Telugu atta), appa "father" (Telugu appa), ura "village" (Telugu uru), pulli "tiger" (Telugu puli). Borrowing from Dravidian occurred also at earlier times; the Dravidians originally occupied territory much farther north than they did in Middle Indo-Aryan times. The Rgveda has such words as kunda "pitcher, pot," which is doubtless of Dravidian origin (Tamil kutam "pot"). Such borrowings become more numerous in later Sanskrit. It is not always certain that borrowing proceeded from Dravidian to Indo-Aryan, however, because Dravidian languages freely borrowed from Indo-Aryan. Thus, some scholars claim that Sanskrit katu "sharp, pungent" is from Dravidian, but others claim that it is a Middle Indo-Aryan form deriving from an earlier *krt-u "cutting" (root krt). (An asterisk [*] preceding a form indicates that it is not attested but has been reconstructed as a hypothetical form.) Whatever the judgment on any individual word, it is clear that Indo-Aryan did borrow from Dravidian, and this phenomenon is important in considering a group of sounds that sets Indo-Aryan apart from the rest of Indo-European -- the retroflexes. Without doubt the influence of Dravidian is to be considered as contributing to the extension of these sounds beyond their limited occurrence in inherited Indo-European items such as nida "nest" (from *ni-sd-o), is-ta "desired" (from *is-to), and stir-na "spread out" (from *str-no). The Munda languages (or, more generally, the Austro-Asiatic languages) are also a source of some borrowing into Indo-Aryan; e.g., Sanskrit jambala "mud" (Santali jobo).

In the 8th century AD, the philosopher Kumarila mentioned not only Dravidian but also Persian and Greek as sources of foreign words. Such borrowing can be traced back to early times. In the 6th century BC Darius counted Gandhara as a province of his kingdom, and Alexander the Great penetrated into northern India in the 4th century BC. From Iranian come words such as that meaning "inscription, writing, script"; in the northwest inscriptions of Ashoka the word is dipi (Old Persian dipi) and Sanskrit has lipi, the form in other Ashokan versions and in Pali. Also from Persian is Sanskrit ksatrapa "satrap"--Old Persian xsassa-pavan-. Of Greek origin are such mathematical and astronomical terms as Sanskrit kendra "centre" (Greek kéntron), jamitra "diameter" (diámetron), and hora "hour" (hora). Yavana "foreigner," originally the Greek word for Ionian, is known from as early as the time of Panini. Later, Arabic words such as tashli "trigon" came into Sanskrit.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Historical survey of the Indo-Aryan languages.: The modern Indo-Aryan stage.

The modern Indo-Aryan stage.
The division of the Indian subcontinent into linguistic states and even into countries (Pakistan, Bangladesh, and India) is a recent phenomenon (see Table 4). Even after independence from Britain was achieved and partition had taken place, Bombay state existed until it was split into Gujarat and Maharashtra states in 1960. The division of Punjab into Punjab and Haryana states in 1966 occurred as a result of Punjabi agitation for a separate linguistic state. Before independence, under British rule (entrenched from the 18th century), there were princely states within dialect areas; under Mughal rule (16th-18th centuries), Persian was the language which was used by the court and by courts of justice and this practice continued in the latter function for a time under the British. Though Hindi-Urdu may have been a lingua franca, however, the great dialectal diversity of earlier times continued.

Some of the modern Indo-Aryan languages have literary traditions reaching back centuries, with enough textual continuity to distinguish Old, Middle, and Modern Bengali, Gujarati, and so on. Bengali can trace its literature back to Old Bengali carya-padas, late Buddhist verses thought to date from the 10th century; Gujarati literature dates from the 12th century (Shalibhadra's Bharateshvara-bahubali-rasa) and to a period when the area of western Rajasthan and Gujarat are believed to have had a literary language in common, called Old Western Rajasthani. Jñaneshvara's commentary on the Bhagavadgita in Old Marathi dates from the 13th century and early Maithili from the 14th century (Jyotishvara's Varna-ratnakara), while Assamese literary work dates from the 14th and 15th centuries (Madhava Kandali's translation of the Ramayana, Shankaradeva's Vaisnavite works). Also of the 14th century are the Kashmiri poems of Lalla (Lallavakyani), and Nepali works have also been assigned to this epoch. The work of Jagannath Das in Old Oriya dates from the 15th century. (see also Index: Bengali language, Marathi language, Oriya language)

Amir Khosrow used the term hindvi in the 13th century, and he composed couplets that contained Hindi. In early times, however, other dialects were predominant in the midlands (Madhyadesha) as literary media, especially Braj Bhasa (e.g., Surdas' Sursagar, 16th century) and Awadhi (Ramcaritmanas of Tulsidas, 16th century). In the south, in Golconda (Andhra, near Hyderabad), Urdu poetry was seriously cultivated in the 17th century, and Urdu poets later came north to Delhi and Lucknow. Punjabi was used in Sikh works as early as the 16th century, and Sindhi was used in Sufi (Islamic) poetry of the 17th-19th centuries. In addition, there is evidence in late Middle Indo-Aryan works for the use of early New Indo-Aryan; e.g., provincial words and verses are cited. (see also Index: Punjabi language, Sindhi language)

The creation of linguistic states has reinforced the use of certain standard dialects for communication within a state in official transactions, teaching, and on the radio. In addition, attempts are being made to evolve standardized technical vocabularies in these languages. Dialectal diversity has not ceased, however, resulting in much bilingualism; for example, a native speaker of Braj Bhasa uses Hindi for communicating in large cities such as Delhi. (see also Index: Hindi language)

Moreover, the attempt to establish a single national language other than English continues. This search has its origin in national and Hindu movements of the 19th century down to the time of Mahatma Gandhi, who promoted the use of a simplified Hindi-Urdu, called Hindustani. The constitution of India in 1947 stressed the use of Hindi, providing for it to be the official national language after a period of 15 years during which English would continue in use. When the time came, however, Hindi could not be declared the sole national language; English remains a co-official language. Though Hindi can claim to be the lingua franca of a large population in North India, other languages such as Bengali have long and great literary traditions--including the work of Nobel Prize winner Rabindranath Tagore--and equal status as intellectual languages, so that resistance to the imposition of Hindi exists. This resistance is even stronger in Dravidian-speaking southern India. The use of English as an official language entails problems, however, because with the use of state languages for education, the level of English competence is declining. Another danger faced is the agitation for more separate linguistic states, threatening India with linguistic fragmentation hearkening back to earlier days.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Characteristics of the modern Indo-Aryan languages.

Characteristics of the modern Indo-Aryan languages.
The trends noted in Middle Indo-Aryan continue in New Indo-Aryan. The Middle Indo-Aryan vowel sequences ai and au were changed to single vowels during the development of New Indo-Aryan, final vowels were shortened and deleted, and d and dh sounds between vowels were replaced by the sounds r and rh. The noun cases were further reduced, and the introduction of nominal (noun) forms into the verb system became more pronounced.

Literary languages tend to become somewhat removed from the usual standard colloquial. Literary, or High, Hindi, for example, tends to replace some of the Perso-Arabic vocabulary with Sanskritic items, whereas literary Urdu makes great use of Perso-Arabic words. The gap is formalized in Bengali, in which a distinction is made between the highly Sanskritic language Sadhu-Bhasa and the colloquial standard called Calit-Bhasa.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Characteristics of the modern Indo-Aryan languages.: Phonology.

Phonology.
[Note: The forms of the words given below reflect actual pronunciation, rather than being transliterated versions of the standard orthographies. For New Indo-Aryan the symbols {schwa}, pronounced as the a in English "sofa," and a are used for the sounds earlier transcribed as a and a, respectively; e.g., Gujarati karu "I do" and maro "beat" are now written k{schwa}ru and maro. This practice permits certain contrasts to be made among sounds that are significant in the description of dialectal features. In Kashmiri words, a is short, opposed to a.]

Vowels in sequence contracted in early New Indo-Aryan; e.g., Old Indo-Aryan ashiti became Middle Indo-Aryan asii, Hindi and Punjabi {schwa}ssi, and Bengali asi "80." Further, ai and au sounds changed to e and o, and au to u, while iu developed into i. The diphthongs ai and au were retained well into the New Indo-Aryan period and are still pronounced in some areas; e.g., Braj Bhasa k{schwa}r{schwa} u "I do," k{schwa}r{schwa} i "he does." Middle Indo-Aryan -d- and -dh developed into the flaps r and rh; e.g., Prakrit sadia "woman's garment," Kashmiri, Lahnda, Hindi, Gujarati, Bhojpuri, Bengali, Oriya sari "sari"; and Prakrit padh- "recite, read," Sindhi p{schwa} rh-{schwa}nu, Lahnda p{schwa}rh-{schwa} n, Hindi, Punjabi p{schwa}rh-na, Gujarati p{schwa}rh-vu, Marathi p{schwa}rh-n{schwa} "study."

Stress is not generally contrastive in New Indo-Aryan as it is, for example, in English (e.g., noun "éxport," verb "expórt"), though different areas have different rules for placing major emphasis on a given syllable. For example, in Hindi, in which vowel length is pertinent, gilá "swallowed" has major stress on the last syllable, gila "wet," on the first. In Gujarati, on the other hand, vowel length is not pertinent; the stress position depends on which vowels occur in contiguous syllables and on the structure of the syllables, whether open or closed; e.g., júno "old," but dukán "store." In Bengali each syllable of a word receives about equal stress.

The sounds that most clearly distinguish Indo-Aryan from the rest of Indo-European are the voiced aspirate stops (gh and the like, pronounced with an accompanying audible puff of breath) and the retroflexes (t and so on, pronounced by curling the tongue upward toward the hard palate). In the outlying New Indo-Aryan areas, however, the sound system is reduced. Sinhalese has no aspirated stops, Assamese has no retroflexes, and Kashmiri has no voiced aspirates. The geographic position of these languages doubtless contributed to these losses: Sinhalese coexists with Tamil, Assamese is surrounded by Tibeto-Burman languages, and Kashmiri is on the border of the Iranian area. (see also Index: Sinhalese language, Assamese language, Kashmiri language)

New Indo-Aryan shows evidence of early dialect distribution; this is discernible by considering sound changes proper to each group. The eastern group (Assamese, Bengali, Oriya) has three important changes. Long and short i and u merged; e.g., Assamese nila, Oriya nil{half-open back vowel} ({half-open back vowel} is similar to the o of "coffee" in some English dialects), Bengali nil "blue-black" but Sanskrit nila; Assamese dhuli, Bengali dhulo, Oriya dhuli "dust" but Hindi dhul and Sanskrit dhuli. The vowel sound a of Middle Indo-Aryan was replaced by {half-open back vowel} in Bengali and Oriya and {back open vowel} (similar to the o of "hot" in southern British English) in Assamese in initial position and open syllables; e.g., Bengali m{half-open back vowel}ron, Oriya m{half-open back vowel}r{half-open back vowel}n, Assamese m{back open vowel}r{back open vowel}n "death"; Sindhi, m{schwa} r{schwa}no "mortal, death," Sinhalese m{schwa}r{schwa} n{schwa}, Gujarati, Marathi m{schwa}r{schwa} n (compare Sanskrit marana-). Moreover, in this group a vowel is affected by the quality of the vowel in a following syllable. For example, in Bengali ami kori "I do," the verb root has o followed by i in the next syllable, but tumi k{half-open back vowel}ro "you do" has an {half-open back vowel} sound; similarly, ami kini "I buy" but tumi keno. As a result of vowel assimilation also, Assamese has an {half-open back vowel} sound instead of {back open vowel} representing Middle Indo-Aryan a: Assamese x{half-open back vowel}hur, Bengali sosur "husband's father" (compare Hindi s{schwa} sur, Prakrit sasura-, Sanskrit shvashura-).

Assamese and Bengali are set off from Oriya. In the former two, Middle Indo-Aryan d and dh merge medially to d (then r) with a subsequent development to r in Assamese; e.g., Oriya darhi, Bengali dari, Assamese dari "beard"; Hindi, Gujarati darhi, Prakrit dadhia. Assamese is also distinguished from Bengali by several developments, among them the merger of Assamese retroflex sounds with dental sounds; e.g., Assamese ut "camel" but Bengali ut, Oriya ot{half-open back vowel}, Sindhi uthu, Lahnda, Pahari utth, and so on. Assamese also has s for earlier c and ch sounds and a z sound for j and jh; e.g., Assamese kas "glass," Bengali kac; Assamese azi "today," Oriya aji, Bengali, Hindi aj. In addition, Assamese replaced an s sound initially by x and between vowels by h--x{half-open back vowel}hur.

Particular sound changes also characterize languages of the northwest. In this group, an older voiceless stop (e.g., t) became voiced (e.g., became d) after a nasal sound; in other areas, the voiceless stop is retained: Kashmiri dand, Punjabi d{schwa} nd, Sindhi d{schwa}ndu "tooth" (the d in Sindhi is an imploded stop; see below) but Assamese, Bengali, Hindi, Gujarati, Marathi dãt, Sinhalese d{schwa}t{schwa} (Sanskrit danta-). Moreover, in the northwest group a voiced stop (e.g., d) preceded by a nasal was assimilated to the latter, resulting in two nasals, which were subsequently reduced to one in some areas; in the rest of New Indo-Aryan, the vowel preceding the nasal was nasalized. Thus, Kashmiri don "churning stick," Sindhi d{schwa} nu "tribute," Punjabi d{schwa}nn "fine," Lahnda d{schwa}nn "force," Kumauni dan "roof" contrast with Assamese dãr "pole," Bengali dãr "oar," Hindi dãd "oppression, fine," and others; all forms derive from Old Indo-Aryan danda- "stick, staff, club, royal power, fine, punishment." (see also Index: Sinhalese language)

In the sequence of a short vowel followed by two consonants, Pahari differs from the rest of the northwest group and agrees with the rest of New Indo-Aryan. In the northwest this sequence either remained unchanged or the cluster was simplified without lengthening of the vowel; other languages generally simplified the cluster and lengthened the vowel: Punjabi bh{schwa} tt, Sindhi bh{schwa}tu, Lahnda bh{schwa}t, Kashmiri bat{central close vowel} "cooked rice, food" but Nepali, Kumauni, Hindi, Assamese, Bengali, Gujarati, Marathi bhat.

Dardic occupies a special position. The sibilant sounds did not all merge here. For example, Kashmiri, a Dardic tongue, has surah "16" with s rather than s, as in most other Indo-Aryan languages, and sat "7" with s. Further, voiced aspirated stops merged with unaspirated stops in Dardic; e.g., Kashmiri gur "horse" but Hindi ghora; Kashmiri d{half-open back vowel}d "milk" but Hindi dudh. (see also Index: Dardic languages)

One major feature distinguishing Sindhi from the rest of the northwest group is the development of a series of imploded stops (also called suction stops and recursive stops), for b, d, j, and g. Implosive stops also occur in the Sindhi vicinity; for example, Kacchi has imploded b. Another feature that distinguishes Sindhi from other northwest languages, including Kacchi, is the retention of the Middle Indo-Aryan final short vowels; e.g., Sindhi {schwa} khi "eye" but Hindi ãkh (Middle Indo-Aryan akkhi-).

Punjabi is distinguished from other members of the northwest group by its tonal system, having low (`), mid ({macron}), and high ({acute}) tones. Initial voiced aspirated stops of earlier Indo-Aryan appear in Punjabi as voiceless stops with low tone on the following vowel; e.g., Punjabi kòra but Hindi ghora; Punjabi tài "2 1/2" but Hindi dhai. Non-initially, a voiced aspirate became unaspirated and the preceding vowel received high tone; thus, Punjabi dd "milk" but Hindi dudh, and Punjabi láb "profit" but Hindi labh.

Gujarati, Marathi, and Konkani in the west and southwest differ from the languages of the midlands in that, as in the east, there is no contrast between long and short i and u vowels. The i of Gujarati and Marathi vis "20" is pronounced like the ee of English "teeth," the i of Gujarati iccha and Marathi iccha "wish" like the i of "pitch," but such a difference is not contrastive, as it is in Hindi (gila "wet": gila "swallowed"). Gujarati has certain features that, in turn, set it apart from the other languages of this group. In addition to e and o sounds, it has the open vowels {half-open front} , {half-open back vowel}; e.g., c{half-open back vowel}thu "fourth" (Middle Indo-Aryan cauttha), b{half-open front}s-vu "to sit" (Middle Indo-Aryan baisai "sits"). Moreover, Gujarati has murmured vowels, generally developed from vowels followed by h; e.g., k{half-open front} h che "says" (h represents murmuring of the vowel), Old Gujarati kahai chai. Marathi and Konkani have two series of affricate sounds; e.g., c (pronounced as the ch in English "chat"; the equivalent of c in some other languages) and c (pronounced as the ts of "rats").

There was clearly mutual influence of Indo-Aryan languages at an early time, together with movement of groups of speakers (compare the position of Pahari). Thus, while Punjabi s{schwa} cc "true" is the expected form comparable to Middle Indo-Aryan sacca- (Old Indo-Aryan satya-), Hindi s{schwa} c "true" does not represent the expected outcome. The item s{schwa}c must come from the Punjabi area.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Characteristics of the modern Indo-Aryan languages.: Grammar.

Grammar.
Like Middle Indo-Aryan, New Indo-Aryan distinguishes only two numbers--singular and plural. Unlike Middle Indo-Aryan, the New Indo-Aryan languages differ in the degree to which gender distinctions are made. Three genders are retained in the west and southwest (Gujarati, Marathi, Konkani), and this is true also of Sinhalese. Unlike Gujarati, Marathi, and Konkani, in which every noun, whether it denotes an animate being or not, has a particular gender that is unpredictable, Sinhalese restricts masculine and feminine gender to animates and neuter to inanimates. The eastern group (Assamese, Bengali, Oriya) has no grammatical gender distinctions, and two genders are distinguished elsewhere.

Over a large area of New Indo-Aryan the noun has only two cases--direct and oblique. A lack of distinction between direct and oblique cases in the plural is typical of several languages, including forms in Hindi, Gujarati, Marathi, and Bhojpuri. Direct forms are used independently, oblique forms before postpositions (words or word elements following a noun that function similarly to English prepositions) and other affixes; the combination of stem and postposition serves the function of inflected case forms of earlier Indo-Aryan. Thus, to denote an object (direct or indirect) Hindi uses the postposition ko, which occurs in direct object constructions normally only with nouns denoting animate beings; e.g., l{schwa} rke-ko dekh-ta h{half-open front} "He sees the boy," l{schwa}rke-ko mithai do "Give a sweet to the boy." Other postpositions are me "in," p{schwa}r "on," se "from, with, by means of." A large group of postpositions are linked to the noun with the affix ka (oblique form ke, feminine ki), which also is used to form adjectives (possessives); e.g., l{schwa} rke-ke sath g{schwa}ya "He went with the boy," l{schwa}rke-ke pas h{half-open front} "The boy has it" (literally, "It is by the boy"). Many such postpositions represent old nominal (noun) forms. Other New Indo-Aryan languages have systems similar to that of Hindi, though the forms of the postpositions differ.

Though the nominal (noun) system of Punjabi is very close to that of Hindi, it has separate ablative (indicating separation and source) and locative (indicating place) forms in the singular and plural, respectively, for nouns such as kotha "house"; e.g., kothiõ "from the house," kothi "in the houses." Some languages have a fuller case system than that noted above; e.g., Bengali has a genitive singular ending, a genitive plural ending, and a locative case. Similarly, Kashmiri has nominative, dative, ablative, and agentive cases. Not all such case forms are inherited from Middle Indo-Aryan. In addition to case endings, these languages also use postpositions; e.g., Kashmiri garajas-andar "in the garage," with -andar after the dative ending -as.

Adjectives behave generally in the same way as nouns but have a syntactic restriction. In Hindi the possessive is in the oblique (non-nominative) form, as is the noun after which it occurs; but in the plural, only the noun has the oblique form. Further, the formation of comparatives and superlatives with derivative affixes has been eliminated. To a Sanskrit sentence such as ime amu-bhyah adhya-tarah "These (people) are richer than those," in which the comparative adhya-tara occurs construed with the ablative form, corresponds a Hindi sentence ye un-se {schwa} mir h{nasal half-open front vowel}, in which no comparative affix is used--literally, "These are rich from (i.e., in comparison with) those." Comparable constructions with a postposition meaning "from" occur elsewhere in New Indo-Aryan.

The pronominal system of New Indo-Aryan formally resembles the Middle Indo-Aryan stage more than its noun system. For example, Gujarati hu "I," m{nasal half-open front vowel} "I" (agentive), {schwa}me "we" (also agentive) are directly comparable to Apabhramsha hau, mai, amhai. The number distinctions of the Middle Indo-Aryan pronoun have been replaced, however, by distinctions of familiarity and politeness. For example, Hindi and Bengali have a three-way distinction--Hindi ap, Bengali apni "you" are polite or honorific forms; Hindi tum, Bengali tumi are informal forms; and Hindi tu, Bengali tui are used only for inferiors and small children. (Hindi and Bengali differ, however, in the plural forms of these.) In Gujarati, on the other hand, tu~ is a very familiar pronoun, whereas t{schwa} me is used generally, covering the approximate domains of Hindi ap and tum; ap, if used, strikes the hearer as fawning. Marathi has a similar system. Southwestern languages also make a distinction in the 1st person plural between inclusive and exclusive, the exclusive excluding the person spoken to. In the form of the relative pronoun and the 3rd person pronoun, languages differ in the degree to which gender distinctions are made, thus contrasting with Old and Middle Indo-Aryan, in which these forms had three genders. For example, Marathi has masculine, feminine, and neuter for the relative pronoun, while Bengali has animate and inanimate.

New Indo-Aryan languages differ in the degree to which finite verb forms have been replaced by nominal (noun) forms. In Bengali a contrast is made between continuous or actual present (English "be . . . -ing") and non-continuous or habitual present; e.g., ami kaj kor-i "I work" (literally, "I do work"), with the ending -i, contrasts with ami kaj kor-ch-i "I am working," in which ch intervenes between the root and the ending. Hindi has a similar contrast but uses nominal forms; e.g., m{nasal half-open front vowel} kam kar-ta hu "I work," m{nasal half-open front vowel} kam k{schwa} r r{schwa}h-a hu "I am working." Both contain the finite form hu of the auxiliary; but k{schwa}r-ta and r{schwa}h-a are nominal forms, the latter the past of r{schwa} h-"stay." Gujarati has both types, the present tense using finite verb forms, the imperfect employing nominal forms; e.g., hu kam k{schwa}ru chu "I work, am working" and hu kam k{schwa}r-to h{schwa} -to "I was working, used to work." Even in areas in which finite forms are not used in the present, they occur in the imperative forms and what may be called the subjunctive; e.g., Hindi tum kam k{schwa}r-o "work," m{nasal half-open front vowel} {schwa} nd{schwa}r au "May I come in?"

The person-number system of the New Indo-Aryan verb accords with the use of pronouns. For example, the forms ja-o, k{schwa}r-o in Gujarati t{schwa}me kyã jao cho "Where are you going?" and su k{schwa}ro cho "What are you doing?" are historically plurals but are used with reference to one person addressed by the pronoun t{schwa} me. Similarly, in Hindi, in which a person distinction is not made in the plural, ap k{schwa} ja r{schwa}he h{nasal half-open front vowel}, ap kya k{schwa}r r{schwa} he h{nasal half-open front vowel}, equivalent in meaning to the Gujarati sentences, have the plural form r{schwa}he h{nasal half-open front vowel}. Bengali has completely given up any number distinction in verb forms: ami/amra kori "I/we do." In the 3rd person a distinction is made between ordinary and honorific: se (ordinary)/tini k{half-open back vowel}ren, plural tara/tãra k{schwa} ren. Other languages (e.g., Hindi) also have honorific forms, for which the plural is used.

In the formation of the future there are again regional differences. Some retain the future in -s- (Gujarati hu k{schwa}r-is, 3rd person e k{schwa}r-s-e) or -h- (e.g., eastern dialects of Braj Bhasa, c{schwa}lih{schwa} õ "I will go"). Characteristic of the Eastern languages and of Bihari (including Bhojpuri, Magahi, Maithili) is the suffix -b-; e.g., Bengali jabe "will go." All of these are finite forms. On the other hand, in Hindi and adjoining areas, the future is inflected for gender.

A similar contrast between the use of verbal and nominally inflected forms also appears in the past tense forms. The predominant pattern in New Indo-Aryan is that of Middle Indo-Aryan: forms are used that are etymologically participles.

The New Indo-Aryan languages retain the passive and causative forms. The causative is conservative in retaining both the affixes that appear in Middle Indo-Aryan and vowel alternation. The passive is also formed by affixation in some areas. But many languages also have a compound formation involving the verb ja "go" and an auxiliary (h{nasal half-open front vowel}); e.g., Hindi yahã hindi bol-i ja-t-i h{nasal half-open front vowel} "Hindi is spoken here."

There are other auxiliaries, which, like h{nasal half-open front vowel}, can occur with any verb in the language; e.g., the verb "can," Hindi s{schwa} k-, Gujarati s{schwa}k. A characteristic feature of New Indo-Aryan, however, is the use of certain verbs, variously called vector verbs or compound verbs, in restricted contexts and with particular semantics. For example, one can say m{schwa}r g{schwa}-ya "He died," bhul g{schwa}-ya "He forgot," bol uth-a "He blurted out" in Hindi, using the verbs ja "go" (masculine singular past g{schwa} -ya), uth "stand up." This phenomenon is pan-Indo-Aryan and still requires investigation.

The examples cited above also illustrate the normal word order in New Indo-Aryan languages: subject (including agential forms), object (with attributive adjectives preceding), verb (together with auxiliaries). Adverbials can precede the full sentence or occur after the subject, with slight differences in emphasis; e.g., Hindi m{nasal half-open front vowel} k{schwa}l auga, or k{schwa}l m{nasal half-open front vowel} auga "I will come tomorrow ( k{schwa}l)." Relative clauses normally precede correlatives: Hindi jo admi k{schwa}l tumhare gh{schwa}r-me tha vo k{half-open back vowel}n h{half-open front} "Who (k{half-open back vowel}n) is the man (admi) who (jo) was in your house yesterday?" A notable exception to the normal final position for verbs occurs in Kashmiri, in which the verb usually occurs in second position after the subject; thus, to Hindi vo kha r{schwa} ha h{half-open front} "he is eating" corresponds Kashmiri su chu khavan with the auxiliary chu after the subject.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Characteristics of the modern Indo-Aryan languages.: Vocabulary.

Vocabulary.
The two most important sources of non-Indo-Aryan vocabulary in New Indo-Aryan are Persian (including Arabic items introduced through Persian), the court language of the Mughals, and English. The Perso-Arabic vocabulary permeates every aspect of New Indo-Aryan vocabulary, especially in the midlands (Uttar Pradesh through the Punjab). There are, of course, Hindi- Urdu words proper to Islam: Hindi kuran "Qur`an," 'id (name of a holy day), n{schwa} maz (certain prayers), m{schwa}sjid "mosque," as well as the word for "religion," m{schwa} zh{schwa}b. In addition, there are numerous Perso-Arabic military and administrative terms (kila "fort," s{schwa}var "horseman," {schwa}dal{schwa} t "court of justice"); architectural and geographic terms (imar{schwa}t "building," m{schwa}kan "house," m{schwa}h{schwa} l "palace," duniya "world," ilaka "province"); words having to do with learning and writing (k{schwa} l{schwa}m "pen," kitab "book," {schwa}d{schwa} b "literature, good manners") and with apparel (jeb "pocket," moja "socks," rumal "handkerchief") and anatomy (khun "blood," g{schwa} rd{schwa}n "neck," dil "heart," bazu "arm," s{schwa}r "head"). Indeed some of the most common vocabulary is of this origin: tarikh "date," v{schwa} kt "time," sal "year," h{schwa} fta "week," um{schwa}r "age," admi "man," {half-open back vowel}r{schwa} t "woman," and others. Even the grammatical apparatus of postpositions and conjunctions reflects Perso-Arabic influence; e.g., -ke bad "after," {schwa} g{schwa}r "if," m{schwa} g{schwa}r "but," ya "or." (see also Index: Urdu language)

The colloquial language used by any Hindu or Muslim communicating in Hindi-Urdu will contain a large number of such words. There have been efforts to polarize the two, and at times champions of Indo-Aryan have tried to replace Perso-Arabic vocabulary with Sanskritic words. The style that tends toward eliminating all but the most common Perso-Arabic words may be called High Hindi, written in the Devanagari script, as opposed to High Urdu, which retains Perso-Arabic of long standing, uses Persian and Arabic for learned vocabulary and is written in the Perso-Arabic script.

The influence of English as a source of borrowing still continues, and it is rare to hear a conversation on any technical subject among speakers of any Indian language in which English words are not liberally used. Among loanwords from English are names of conveyances such as Hindi rel-gari "railroad-train" and t{half-open front}ksi "taxi"; profession names such as injinir "engineer," j{schwa}j "judge," dakt{schwa}r "Western doctor," pulis "police"; and terms of educational administration such as kal{schwa}j "college" and yuniv{schwa}rsiti "university." English words are susceptible to replacement in India by Sanskritic ones as are those of Perso-Arabic origin.

Of much lesser magnitude are New Indo-Aryan borrowings from other languages, among them Portuguese and Turkic. From the latter, the word urdu came to be used as the name of a language. From Portuguese come such Hindi words as {schwa} n{schwa}nnas "pineapple," pau "(Western style) bread," k{schwa} miz "(Western) shirt," k{schwa}mra "room," and girja "(Christian) church."

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE INDO-ARYAN LANGUAGES: Characteristics of the modern Indo-Aryan languages.: Writing systems.

Writing systems.
Ancient India had two main scripts in which Indo-Aryan languages were written. Kharosti, used in the northwest, is of Aramaic origin and is written from right to left; Brahmi, of North Semitic origin, is written from left to right and appears earliest on Ashokan inscriptions in areas other than the northwest. Most scripts of New Indo-Aryan are developments of the Brahmi. The Devanagari (or simply Nagari), used for writing Sanskrit documents in North India, is the script of Hindi and Marathi as well as Nepali. Gujarati uses a more cursive derivative. Devanagari also is used, mainly among Hindus, for Kashmiri, which has, in addition, a traditional script called Sarada, which is not now in common use. The Perso-Arabic script is used instead. Also usually written in Perso-Arabic are Urdu and Sindhi (for which the Devanagari also is used in schools in India), whereas Punjabi employs it in Pakistan as well as a particular script of its own, known as Gurmukhi ("From the Teacher's Mouth") in the sacred writings of the Sikhs. In the east, the scripts used for Bengali and Assamese are closely related; and that of Oriya, related to the other two, is highly cursive like that of neighbouring Dravidian languages. Such is also the case with Sinhalese. (see also Index: Indic writing systems, Marathi language, Gujarati language, Kashmiri language)

The traditional alphabets are both over-explicit and not clear enough with regard to accurate representation of the spoken word. As systems in which a consonant symbol with no other accessory symbol accompanying it stands for the syllable consisting of the consonant followed by short a, they require previous knowledge of items for correct interpretation; Hindi k{schwa}rta is written ka-ra-ta in the Devanagari, and, to pronounce it properly, one must know that the word has only two syllables. Although Bengali has only the spirant sound s, the alphabet has symbols for sh, s, and s, as in Old Indo-Aryan; but verb forms such as kori and k{schwa} ren are written ka-ri and ka-re-na, both with the same initial symbol. And, though syllabic r was lost as early as Middle Indo-Aryan, the scripts have a separate symbol for this. Script reform has been suggested; it has even been proposed that all Indo-Aryan languages adopt a Latin (roman) alphabet with diacritics, but chances for this are poor. (See also WRITING: Indian alphabets.) (Ge.Ca.) (see also Index: Bengali language)

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE IRANIAN LANGUAGES

THE IRANIAN LANGUAGES

Languages of the group.
The various Iranian languages fall distinctly into three categories--Ancient, Middle, and Modern Iranian.

Ancient (Old) Iranian.
Of the ancient Iranian languages, only two are known from texts or inscriptions, Avestan and Old Persian, the oldest parts of which date from the 6th century BC. Avestan was probably spoken in northeastern Iran, and Old Persian is known to have been used in southwestern Iran. Other ancient Iranian languages must have existed, and indirect evidence is available concerning some of these. Thus, from the 5th-century-BC historian Herodotus, the Median word for "female dog" (spaka) is known, and a number of Median loanwords have been recognized in the Old Persian inscriptions. In addition, a number of Median personal names are attested in various sources. It is likely that all those languages that are known only from the Middle Iranian period were in fact spoken in a less developed form in the ancient period. It is possible that the same observation applies to some of those modern Iranian languages that are not attested in the earlier periods. (see also Index: Avestan language, Persian language)

The degree of mutual intelligibility that existed among the ancient Iranian languages is not known with certainty. The differences in the nature of the surviving sources have to be borne in mind. On the one hand, there is the religious poetry of Zoroaster in the Avestan language and, on the other, the official inscriptions of the Achaemenid rulers in Old Persian. Differences in the method of transmission present a further difficulty in the way of direct comparison. Nevertheless, it can safely be stated that the degree of mutual intelligibility must have been much greater between the ancient languages than between the Middle Iranian languages and that those languages geographically closer to each other probably were mutually understood better than those spoken in areas farther apart.

Avestan can hardly be said to be known beyond the ancient period, although only the earliest texts, the Gathas, are as old as the 6th century BC, and the later texts represent the language of several subsequent centuries. Old Persian, on the other hand, itself spanning the 6th to the 4th century BC, was continued more or less directly by the various forms of Middle Persian. Even in this case, however, although both Old and Middle Persian represent the language of the royal court, the considerable differences between them remain unexplained.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE IRANIAN LANGUAGES: Languages of the group.: Middle Iranian.

Middle Iranian.
Middle Persian is known in three forms, not entirely homogeneous--inscriptional Middle Persian, Pahlavi (often more precisely called Book Pahlavi), and Manichaean Middle Persian. Middle Persian belongs to the period 300 BC to AD 950 and was, like Old Persian, the language of southwestern Iran. In the northeast and northwest the language spoken was Parthian, which is known from inscriptions and from Manichaean texts. There are no significant linguistic differences in the Parthian of these two sources. Most Parthian belongs to the first three centuries AD. (see also Index: Parthian language)

Middle Persian and Parthian were doubtlessly similar enough to be mutually intelligible, but they differ so greatly from the eastern group of Middle Iranian languages that these must have appeared to be almost foreign languages. The languages of the eastern group, moreover, cannot have been themselves mutually intelligible. The main known languages of this group are Khwarezmian (Chorasmian), Sogdian, and Saka. Less well-known are Old Ossetic (Scytho-Sarmatian) and Bactrian, but from what is known it would seem likely that these languages were equally distinctive. There was probably more than one dialect of each of the languages of the eastern group, although there is certainty only in the case of Saka, for which at least two dialects are clearly attested. The main Saka dialect is known as Khotanese, but a small amount of material survives in a closely related dialect called Tumshuq, formerly known as Maralbashi. (see also Index: Saka language, Ossetic language)

A few words are known in all of these eastern Iranian languages from as early as the 2nd to the 4th century AD, but substantial evidence begins for Sogdian in the 4th century, for Saka probably no earlier than the 7th century (though that for Tumshuq may be a few centuries older), and for Khwarezmian not until the 12th century and later. The principal evidence for Bactrian belongs to the 2nd century. To the same period belong the Scytho-Sarmatian names of the earliest inscriptions.

All the eastern Iranian languages of the Middle Iranian period were spoken in Central Asia, with the exception of the language of the Scytho-Sarmatian inscriptions from what is now Ukraine, north of the Black Sea. More precisely, Bactrian was spoken in northern Afghanistan and in the adjacent parts of Central Asia. Khwarezmian was the language of Khwarezm, a historic region in present-day Turkmenistan and Uzbekistan but formerly of greater extent. Scholars believe that Sogdian was probably spoken over most of Central Asia, especially in eastern Uzbekistan, Tajikistan, and western Kyrgyzstan. There were also colonies of Sogdians in various cities along the trade routes to China; in fact, most Sogdian material comes from outside Sogdiana. The Saka dialects, Khotanese and Tumshuq, were spoken in Chinese Turkistan, modern Sinkiang; Tumshuq is the name of a small village in the extreme west of Sinkiang. Khotanese was spoken in Khotan near the modern city of Khotan (Chinese Ho-t'ien [Hotan]) on the southern route across the Takla Makan Desert and within about 100 miles (160 kilometres) to the north and to the east of Khotan, where manuscripts have been found, mainly at the sites of former shrines and monasteries.

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE IRANIAN LANGUAGES: Languages of the group.: Modern Iranian.

Modern Iranian.
The discontinuity already observed between Old and Middle Iranian is even more striking between Middle and Modern Iranian. There are no modern counterparts to Khwarezmian, Bactrian, and Saka, and there is no direct continuity in the case of any of the other Middle Iranian languages. Even Modern Persian does not represent a straightforward continuation of Middle Persian but is rather a koine (a dialect or language of a small area that becomes a common or standard language of a larger area), based mainly on Middle Persian and Parthian but including elements from other languages and dialects. Although Sogdian is known in several forms, possibly representing different dialects, none of these can be considered the direct ancestor of modern Yaghnabi, spoken at present in the valley of the Yaghnob River, a tributary of the Zeravshan. Yaghnabi, nevertheless, certainly belongs linguistically to the Sogdian family. Similarly, the languages of the Scytho-Sarmatian inscriptions may represent dialects of a language family of which Modern Ossetic is a continuation, but it does not simply represent the same language at an earlier date. (see also Index: Yaghnabi language)

Only four of the many modern Iranian languages are the official languages of the state in which they are spoken. The chief of these is Persian (known in Persian as Farsi), the national language of Iran, which is spoken by about 27,000,000 people as a native language. A dialect of Persian known as Dari is recognized, moreover, as a second language in Afghanistan. The national language of Afghanistan is the East Iranian language known as Pashto, of which there are some 9,000,000 speakers, many living in Pakistan. Tajik is spoken by at least 7,000,000 people widely spread throughout Tajikistan and the rest of Central Asia and is readily intelligible to speakers of Persian, to which it is very closely related, although it is in some respects more archaic. (see also Index: Tadzhik language)

In addition to being the national language of Tajikistan, Tajik is important as the lingua franca of the Pamirs mountain range, a region where a remarkable variety of Iranian languages and dialects are spoken. Some 700,000 people speak Ossetic. Most of the Ossetes live in North Ossetia in Russia and South Ossetia in Georgia. Although spoken in the heart of the Caucasus Mountains, Ossetic is an East Iranian language not mutually intelligible with any other Iranian language.

Two other Iranian languages, Kurdish and Balochi (Baluchi), are spoken over a vast area, although they have not been officially accepted as the national language of an established state. Kurdish is spoken by more than 10,000,000 people living in Iran, Iraq, Turkey, Syria, and Transcaucasia. More than 5,000,000 people speak Balochi as their chief language; they are spread widely over parts of eastern Iran, Pakistan, Afghanistan, and Central Asia. In Iran, Balochi speakers live mainly in Baluchistan, a region in the southeast that now forms part of a province with Sistan. In Pakistan, Balochi speakers live mainly in the southwestern province of Balochistan; in Central Asia, they are found mainly around Mary (Merv) in southern Turkmenistan; and in Afghanistan, they are widely scattered, mainly over the southwestern portion of the country. There is a sizable Balochi colony in Oman, and many Balochi merchants have settled in the sheikhdoms of southern Arabia and along the east coast of Africa as far south as Kenya. Linguistically, Balochi and Kurdish are both West Iranian languages. Balochi is thus much more closely related to Kurdish than it is to its close neighbour Pashto. According to the most likely theory, the present eastern location of Balochi speakers is the result of migrations from the region of the Caspian Sea during the Middle Ages. (see also Index: Balochi language)

Languages of the World: INDO-EUROPEAN LANGUAGES: Indo-Iranian languages: THE IRANIAN LANGUAGES: Languages of the group.: Dialects.

Dialects.
The six modern Iranian languages discussed above are the only ones that have an established literary tradition. They are not, however, homogeneous, each having its own dialect divisions. No definitive dialect classification has yet been made, nor indeed has any attempt at systematic classification of the whole range of Iranian languages won wide acceptance. The usual practice, followed here, is simply to list the main languages in groups of varying size, arranged on a roughly geographic basis.

There are two main dialects of Ossetic: the eastern, known as Iron, and the western, known as Digor (Digoron). Of these, Digor is the more archaic, Iron words being often a syllable shorter than their Digor counterparts--e.g., Digor madä, Iron mad "mother." Iron is spoken by the majority of Ossetic speakers and is the basis of the literary language. Chosen in the 19th century for the translation of the Bible, it is still the official language today. Little is known of the other Ossetic dialects. A small amount of the Ossetic dialect of Tual in the south, which differs little from Iron, was published in Georgian script at the beginning of the 19th century.

Yaghnabi is still spoken by a small number of people southeast of Samarkand, Uzbekistan. It has two main dialects, eastern and western, which differ only slightly. The characteristic difference is between a western t sound and an eastern s sound from an older {theta} sound (as th in English "thin")--e.g., western met, eastern mes "day," beside Sogdian me{theta} (Christian Sogdian my{theta}).

Dialects of the Shughni group are spoken in the Pamirs. Closely related to this group is Yazgulami. A period of a Yazgulami-Shughni common language (protolanguage) has been postulated by some scholars, after which it separated first into Yazgulami and Common Shughni; and then Common Shughni gradually divided into Sarikoli, Oroshori-Bartangi, Roshani-Khufi, and Bajuvi-Shughni. Sarikoli, the easternmost of these dialects, is spoken in northwestern China.

Speakers of Wakhi number 10,000 or so in the region of the upper Pyandzh (Panj) River. Vakhan (Wakhan), the Persian name for the region in which Wakhi is spoken, is based on the local name Wux, a Wakhi development of *Waxsu, the old name of the Oxus River (modern Amu Darya). (An asterisk denotes a hypothetical, unattested, reconstructed form or word.) The Wakhi language is remarkably distinct from its neighbours and has many archaic features.

Around the bend of the Amu Darya and in the valley of the Varduj River to the southeast, a few people speak dialects of the Sanglechi-Ishkashmi group. This group is clearly distinguished from its neighbours but is closely related to the other languages of the Pamirs.

Some 6,000 people speak dialects of the Yidgha-Munji group. Monjan is a very remote valley located in northern Afghanistan, and it is separated by a mountain pass from the Sanglechi-speaking region. Yidgha is spoken in the valley of the Lutkho River and in the nearby city of Chitral, a region now in Pakistan. Yidgha-Munji is most closely related to Pashto.

The existence of two dialectal groups within Pashto has long been known. Thus, the word Pashto represents a southwestern dialect form (pasto), in contrast to a northeastern (paxto). According to one hypothesis, Pashto literature, which exists certainly from the 17th cent