CarolAnn Edie and Jeffrey Heinz
Welcome! This project is meant to present an atheoretical description of the documented stress patterns of the world’s languages, as well as to provide a reference for those interested in stress patterns across languages. There are 422 languages and 109 distinct stress patterns included in the database. The database also includes results of the Forward-Backward Learner (Heinz 2007). This database is open source (mysql) and is freely available: http://www.mysql.com. The stress patterns in the database are collected primarily from the typologies of Gordon (2002) and Bailey (1995). Like all databases, this is a work in progress. Please send any questions/corrections etc. to Jeff Heinz: firstname.lastname@example.org. There is another stress database, StressTyp on the web that includes some information not present here. People interested in stress should also consult StressTyp here: http://stresstyp.unleiden.net/form2b.htm.
The database interface is comprised of 18 tables. These tables may be Data Tables or Linking tables. These tables describe languages, their FSA representations, and the results of the learning model.
Data tables simply list different values for one datum. For example, the Language table lists the languages and values relevant to them. Similarly, in the Primary Stress table, each record represents a primary stress pattern. The data tables in the database are: Languages, Primary Patterns, Secondary Patterns, Phonotactics, Sources, Sizes, Weights, Notes, Syllables, and the three views of the FSA table (Learner Results nd (fsa), Learner Results gram/prec (fsa), Distinct Pattern Properties (fsa)).
Linking Tables, such as the Languages Primary table, link one record from one table with a record in another table. For example, the Languages Primary table associates each language with a primary stress pattern (see 3). These tables might also include some other field which is associated with this linkage, such as the ‘sizes’ field in the Languages Primary table. In this case, this indicates what size words to which the pattern applies in this language.
The database includes atheoretical descriptions of stress patterns in English, Finite State representations of the stress patterns, and the results of the Heinz (2007) learning model. Additionally, we hope to add information about the complexity of the patterns with respect to the Subregular Hierarchy (McNaughten and Paret 1971, Pullum and Rogers 2007).
While navigating the database, you will find there are 18 tables listed at the top of the page. The following section specifies what these tables mean and what they are comprised of. It is separated into two halves: Data Tables and Linking Tables.
Data tables simply list different values for one datum.
This table contains information about each language, the domain in which the stress pattern is valid, and other information more generally about the language. A language may be listed more than once if different stress patterns occur in different domains, or if different researchers provide different descriptions.
On the language table, clicking on the link which is the language ID field will bring you to the Language Page. This page is brings together much of the salient information in the database about one particular language. This includes the relevant ethnologue code(s), if any, the type of stress pattern ((un)bounded, quantity (in)sensitive), prose statements of primary and secondary stress patterns, a Syllable Weight Hierarchy chart, statement(s) of relevant phonotactic information, sources, and finite state acceptor informational charts and diagrams.
This table lists each primary stress pattern included in the database by a code.(See information on SPC codes: Chapter 1, Section 3)
This table lists each secondary stress pattern included in the database by a code. (See information on SPC codes: 3)
This table lists some phonotactic pattern that researchers have described, along with the finite state acceptor that can account for it.
This table lists the biliographical information of sources referenced in the database.
This table explicates the size-code, as seen in the Languages Primary and Languages Secondary tables. These codes, found in the ‘size’ field, for example “5+”, are elaborated in prose in the ‘English’ field. 5+ would be ‘five or more syllables’, 5- would be ‘less than five syllables’, and just 5 would be ‘exactly five syllables’.
This table lists the possibilities for the numbers of weights that are distinct in a language. W0 means there are no weight distinctions, W1 means there is a light/heavy distinction, and so on. For each possibility, there is a finite state acceptor in the ‘fsa’ field that generates it.
This table is a simple list of notes that were made on various languages. These notes are associated with certain languages in the Language Notes table. See 2.2.6.
This table indicates types of syllable patterns. It is organized from lightest to heaviest, from W0 (weight zero) to W4 (weight four). The information within the cells refers to syllable structure and consonant and vowel features.
The following three tables are in actuality three views of one single table. This table contains the distinct stress patterns that exist in the database.
This table diplays a list of representative languages (that is, there is one language listed here for each combination of primary and secondary stress patterns in the database), summaries of their primary and secondary stress patterns in the expanded SPCs (see 3) and phonotactic information, then present information about the neighborhood-distinct learners and finite state acceptors obtained by such learners.
The number in the next three fields dictates that we needed to present words with up to this number of syllables in order for the learner to converge to the pattern.
The next nine fields have values true or false.
This table diplays a list of representative languages, summaries of their primary and secondary stress patterns in the expanded SPCs (see 3) and phonotactic information, then present information about the learner results. This information is included in the following fields:
Since the precedence learner returned languages which were strict supersets of the languages returned by the n-gram learners, the prNg fields track the n-gram fields exactly.
This table diplays a list of representative languages ordered based upon the type of pattern, summaries of their primary and secondary stress patterns in the expanded SPCs (see 3) and phonotactic information, specifies the type of pattern, (Quantity (in)sensitive, (un)bounded), and presents information about the FSA.
Linking Tables serve to link one record from one table with a record in another table.
This table links languages in the database with the syllable priority code (see 3) for its primary stress pattern (if any). The size (in number of syllables) of words in which this pattern is valid is indicated in the ‘1sizes’ field (see 2.1.6). The ‘notes’ field is an internal developemnt table that is used to keep track of things.
This table links languages in the database with the syllable priority code (see 3) for its secondary stress pattern (if any). The size (in number of syllables) of words in which this pattern is valid is indicated in the ‘2sizes’ field (see 2.1.6). The ‘notes’ field is an internal development table that is used to keep track of things.
This table associates languages with their relevant phonotactic information. The ‘notes’ field is an internal developemnt table that is used to keep track of things.
This table associates each language with the relevant source id, and in some cases, a source’s page number from which the language information was taken. Full reference text may be found in the Sources table or on the Language Page. The ‘notes’ field is an internal developemnt table that is used to keep track of things.
This table associates each language ideally with one ethnologue code in order to aid language identification. Some of the languages in this database are not listed on this table. These are the languages with, as far as we can tell, no associated ethnologue code. In these cases, there was no clear indication of which ELC referred to a language which corresponded the best with the one listed in the database. There were other instances in which there were several ELCs which were possible codes. In those cases, multiple entries were created for the language and one possible ELC is associated per entry. If you have more information about the appropriate ELC for a language, please e-mail Jeff Heinz: email@example.com.
This table relates languages to the notes that are relevant to them. Topics range from notes on stress patterns, to syllable structure, and some general language notes.
The Syllable Priority Code system was developed by Bailey (1995) as a shorthand for indicating primary stress assignment rules (Heinz, 2007; Bailey, 1995). An online description of his system may be found here (/urlhttp://www.cf.ac.uk/psych/subsites/ssdb/syllableprioritycode/index.html).
A full review of Heinz’s system may be found on pages 192-194 of Heinz 2007 (http://phonology.cogsci.udel.edu/~heinz/diss/heinz-2007-UCLA-diss.pdf).
The last character of the SPC (L or R) indicates from which edge of the word to begin counting. Thus the initial syllable is designated 1L, the peninitial 2L, the penultimate 2R, and the final syllable 1R. Thus the simplest SPC codes for main stress, such as 1L (Afrikaans), simply mean main stress falls on the initial syllable.
Generally, more complex SPCs can be read as a series of if-then-else statements. Slashes indicate a quantity-sensitive rule with rules governing heavier syllables occurring left of the slash. Thus the SPC 12/2L (Maidu) unpacks to the following: If the initial syllable is heavy, it gets stress, else if the peninitial syllable is heavy, it gets stress, else stress falls on the peninitial syllable. If the numbers are suffixed with @s, it means primary stress is assigned if the syllable position carries secondary stress.
Unbounded patterns, where the stress can fall any distance from the word edge, use the 12..89 construct. For example, the SPC for Amele 12..89/1L unpacks to the following: If the first syllable counting from the left is heavy then it receives primary stress, else if the second syllable counting from the left is heavy then it receives primary stress . . . otherwise (if there are no heavy syllables) the first syllable counting from the left receives primary stress. Since words are unbounded in length, Bailey (1995) uses ..89 to indicate “and so on” in the increasing order for any length. Thus 89 do not literally mean the 8th or 9th syllable. Rather 9 means the farthest syllable from the relevant edge and 8 means the next-to-farthest syllable from the relevant edge and so on. See Bailey (1995) for more details.
SPCs that are followed by (n+) means the code only applies to words that have at least n syllables. Likewise SPCs that are followed by (n-) means the code only applies to words that have at most n syllables.
[Heinz (2007) expands Bailey’s system to include secondary stress systems. Included in the expansion are the symbols i, meaning that stress is applied iteratively, @mL and @mR, which mean that stress is applied to the left counting from the main stress, and to the right, counting from the main stress, respectively, H, indicating that stress falls on heavy syllables only, and a system of Hs and Ls in parentheses to describe foot-based stress patterns.] ‘None’ of course means that no secondary stress is present. ‘Not included’ means that source material reports secondary stresses, but that either 1) the source material did not describe it, usually because it was deemed too complex, or 2) the source material did describe it, but the pattern was either unclear or too complicated for me to incorporate into the study due to the usual suspect: time.
Since secondary stress patterns are often iterative (that is can be described recursively once the position of one stress is known), secondary stress patterns that can be described iteratively are indicated with the prefix i-. The prefix i2 means the second syllable from a stress receives a stress (in both directions). The first stress is indicated with a SPC suffixed with a @ symbol. Thus i2@1L (Bagandji) indicates secondary stresses fall on odd syllables from the left, whereas i2@2R (Anejom) indicates secondary stresses fall on even syllables from the right. @m means that the first stress upon which the iterative procedure is based is the position of main stress. @mL means the iterations proceeds only leftwards of main stress. Likewise, @mR means the iterations proceeds only rightwards of main stress.
When the secondary stress rules are quantity-insensitive, H,L,X are used to desig- nate heavy, light, and either heavy or light syllables, respectively. Thus a typical trochaic pattern is designated i(‘H,‘LL) and a typical iambic pattern i(H’,LX’). If the iterative procedure begins from the word edge (as opposed to from a particular position), the connective @ is left off and L or R is suffixed to indicate whether the pattern proceeds from the left or right edge, respectively. Thus i(‘H,‘LL)R (Inga) means trochees are iteratively constructed from the right word edge.
Whenever only heavy syllables bear secondary stress, this is indicated with H. Sometimes it is necessary to explictly mention that secondary stress only precedes main stress (as in cases describable with foot extrametricality), in which case the symbol < is used.
This document was translated from LATEX by HEVEA.