Bayle Shanks's website: notes-conLangs

Of historical interest

http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners (about Ithkuil but gives a history of conlangs too)

"...

John Wilkins, who tried to actualize their lofty ideals. In his “Essay Towards a Real Character, and a Philosophical Language,” from 1668, Wilkins laid out a sprawling taxonomic tree that was intended to represent a rational classification of every concept, thing, and action in the universe. Each branch along the tree corresponded to a letter or a syllable, so that assembling a word was simply a matter of tracing a set of forking limbs until you’d arrived on a distant tendril representing the concept you wanted to express. For example, in Wilkins’s system, De signifies an element, Deb is fire, and Debα is a flame.

Wilkins’s taxonomic-classification scheme, which organized words by meaning rather than alphabetically, was not entirely without use: it was a predecessor of the first modern thesaurus. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" By the nineteenth century, the dream of constructing a philosophical language capable of expressing universal truths had given way to the equally ambitious desire to unite the world through a single, easy-to-learn, politically neutral, auxiliary language. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

esperanto

" Esperanto, which was invented in the eighteen-eighties by L. L. Zamenhof, a Jewish doctor from Białystok, was by far the most successful of a hundred or so universal languages invented in the nineteenth century. At its peak, it had as many as two million speakers, and produced its own rich literature, including more than fifteen thousand books.

Two world wars and the ascent of global English punched an irreparable hole in the Esperantists’ dream of creating a universal language. Like every other attempt to undo the tragedy of Babel, Esperanto was ultimately a failure. And yet, by some estimates, Esperanto still has more speakers than six thousand of the languages spoken around the world today, including approximately a thousand native speakers (among them George Soros) who learned it as their first language. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

loglan

"In 1955, a sociologist and science-fiction writer named James Cooke Brown decided he would test the Sapir-Whorf hypothesis by creating a “culturally neutral” “model language” that might recondition its speakers’ brains.

Brown based the grammar for his ten-thousand-word language, called Loglan, on the rules of formal predicate logic used by analytical philosophers. He hoped that, by training research subjects to speak Loglan, he might turn them into more logical thinkers. If we could change how we think by changing how we speak, then the radical possibility existed of creating a new human condition. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

Ithkuil

" “I had this realization that every individual language does at least one thing better than every other language,” he said. For example, the Australian Aboriginal language Guugu Yimithirr doesn’t use egocentric coördinates like “left,” “right,” “in front of,” or “behind.” Instead, speakers use only the cardinal directions. They don’t have left and right legs but north and south legs, which become east and west legs upon turning ninety degrees. Among the Wakashan Indians of the Pacific Northwest, a grammatically correct sentence can’t be formed without providing what linguists refer to as “evidentiality,” inflecting the verb to indicate whether you are speaking from direct experience, inference, conjecture, or hearsay.

Inspired by all the unorthodox grammars he had been studying, Quijada began wondering, “What if there were one single language that combined the coolest features from all the world’s languages?” Back in his room in his parents’ house, he started scribbling notes on an entirely new grammar that would eventually incorporate not only Wakashan evidentiality and Guugu Yimithirr coördinates but also Niger-Kordofanian aspectual systems, the nominal cases of Basque, the fourth-person referent found in several nearly extinct Native American languages, and a dozen other wild ways of forming sentences. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" It was on one of those pilgrimages that he discovered “Metaphors We Live By,” a seminal book, published in 1980, by the cognitive linguists George Lakoff and Mark Johnson, which argues that the way we think is structured by conceptual systems that are largely metaphorical in nature. Life is a journey. Time is money. Argument is war. For better or worse, these figures of speech are profoundly embedded in how we think.

For Quijada, this was a revelation. He imagined that Ithkuil might be able to do what Lakoff and Johnson said natural languages could not: force its speakers to precisely identify what they mean to say. No hemming, no hawing, no hiding true meaning behind jargon and metaphor. By requiring speakers to carefully consider the meaning of their words, he hoped that his analytical language would force many of the subterranean quirks of human cognition to the surface, and free people from the bugs that infect their thinking.

“As time went on, my goal began changing,” he told me. “It was no longer about creating a mishmash of cool linguistic features. I started getting all these ideas to make language work more efficiently. I thought, Why don’t I just create a means of finishing what all natural languages were unable to finish?” " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" In 2008, Peterson awarded Ithkuil the Smiley Award for the best invented language of the year. “Few have or, I’m sure, ever will, produce anything as complete and compelling as Ithkuil,” he proclaimed in the award presentation. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" When I met him, Quijada was preparing to deliver a talk on the topic of phonoaesthetics, that hard-to-pin-down quality which gives a language its personality and makes even the most argumentative Italian sound operatic, the most romantic German sound angry, and Yankee English sound like a honking horn. He asked rhetorical questions of the audience, such as “Should my language include diphthongs?” while offering advice like “If you put front vowels in your language, nobody will take it seriously as a language of Orcs.” -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" Many conlanging projects begin with a simple premise that violates the inherited conventions of linguistics in some new way. Aeo uses only vowels. Kēlen has no verbs. Toki Pona, a language inspired by Taoist ideals, was designed to test how simple a language could be. It has just a hundred and twenty-three words and fourteen basic sound units. Brithenig is an answer to the question of what English might have sounded like as a Romance language, if vulgar Latin had taken root on the British Isles. Láadan, a feminist language developed in the early nineteen-eighties, includes words like radíidin, defined as a “non-holiday, a time allegedly a holiday but actually so much a burden because of work and preparations that it is a dreaded occasion; especially when there are too many guests and none of them help.” " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

" If you imagine all the possible notions, ideas, beliefs, and statements that a human mind could ever express, Ithkuil provides a precise set of coördinates for pinpointing any of those thoughts. The final version of Ithkuil, which Quijada published in 2011, has twenty-two grammatical categories for verbs, compared with the six—tense, aspect, person, number, mood, and voice—that exist in English. Eighteen hundred distinct suffixes further refine a speaker’s intent. Through a process of laborious conjugation that would befuddle even the most competent Latin grammarian, Ithkuil requires a speaker to home in on the exact idea he means to express, and attempts to remove any possibility for vagueness. " -- http://www.newyorker.com/magazine/2012/12/24/utopian-for-beginners

misc

"voronoff 17 hours ago

link

parent

flag

For anyone who is interested in what an ideal language would look like, particularly in respect to brevity vs. informativeness I'd highly suggest looking into Terry Regier's work: http://lclab.berkeley.edu/

I worked in his lab on one of many projects showing that most human languages use a near optimal trade-off in various semantic domains (so far - color, kinship, containers, and spatial relations). His work also includes some of the best evidence for some language dependent forces in cognition interacting with some universal ones. " -- https://news.ycombinator.com/item?id=8180924

constructed writing systems (scripts)

https://en.wikipedia.org/wiki/Constructed_script

for communication with non-human animals

See [1]

global etymologies

http://en.wikipedia.org/wiki/Proto-Human_language#Vocabulary

sidebar in http://www.utexas.edu/features/2005/babble/

linguistic universals

http://www.scholarpedia.org/article/Language_%28linguistics%29#What_languages_have_in_common
http://www.uio.no/studier/emner/hf/ikos/EXFAC03-AAS/h05/larestoff/linguistics/Chapter%203.%28H05%29.pdf (has some interesting semantic universals too, eg 'lexical universals'

confusable/ambiguous/recommended/standardized subsets of letters / symbols / alphabets

https://en.wikipedia.org/wiki/United_States_license_plate_designs_and_serial_formats#Skipping_characters
- some states skip I, O, and Q
https://www.ismp.org/newsletters/acutecare/showarticle.aspx?id=81
- has a long list of pairs of letters/numbers that can be confused. Mentions particularly I/1/l, O/0, u/0/4, z/2, 0/4/9/6, S/L
https://en.wikipedia.org/wiki/GSM_03.38
https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt
- eliminates `0', `l', `v', and `2' as potentially confusing

conscripts (constructed alphabets, etc)

When you look at visual representations of different alphabets (scripts), you notice that often all the letters in one alphabet have a distinct style which distinguishes them from other alphabets. I wonder if you could make an alphabet that is harder to confuse if you took each letter from a different alphabet (or if you looked at what made different alphabets seem to have different styles, and work off of that). Some properties that an ideal alphabet/script would have:

not too few and not too many letters
- "Alphabets typically use a set of 20-to-35 symbols to fully express a language" [2]
- "Linguists use letter frequency analysis as a rudimentary technique for language identification, where it's particularly effective as an indication of whether an unknown writing system is alphabetic, syllablic, or ideographic. For example, the Japanese Hiragana syllabary contains 46 distinct characters, which is more than most phonetic alphabets, e.g. the Hawaiian alphabet which has a mere 13 letters, or English which has 26." [3]
- "As of the most recent change in 2005,[4] there are 107 letters, 52 diacritics and four prosodic marks in the IPA" [4]
  - https://www.internationalphoneticassociation.org/sites/default/files/IPA_Kiel_2015.pdf seems to show 59 pulmonic consonants, 15 non-pulmonic consonants, 12 'other symbols', 28 vowels (sums to 114, which is 7 more than 107)
  - "Most writing systems are not purely one type. The English writing system, for example, includes numerals and other logograms such as #, $, and &, and the written language often does not match well with the spoken one" [5]
- perhaps see also https://en.wikipedia.org/wiki/Letter_frequency
phonetic ("shallow orthography" with "regular spelling")
- (to the extent that it is not phonetic then you have to memorize two things for each word, the auditory representation and the visual representation, whereas for something that is totally phonetic you only have to memorize one)
- ("Some psycholinguists believe that the complexity of a language’s orthography (whether it has a high phoneme-grapheme correspondence or an irregular correspondence in which sounds don’t clearly map to symbols) affects the severity and occurrence of dyslexia, postulating that a more regular system would reduce the number of cases of dyslexia and/or the severity of symptoms") [6]
quick to read
still relatively easy to read if vision is impaired, eg in older people
the letters are not very easily confused with each other (minimize homoglyphs), and in fact are easy to distinguish quickly and correctly (even by simple automated pattern recognition methods, or by dyslexic people)
- how to measure this? The gold standard would be speed and reading and errors in reading by experienced human readers of the alphabet -- but this is expensive to measure. Perhaps a cheaper metric would be errors in reading by simple automated pattern recognition methods. Intermediate would be speed of reading and errors in reading by human novices.
- some ideas for intermediate target metrics:
  - reduce per-pixel similarity between glyphs
  - reduce similarity between letters when described as strokes or basic shapes
  - also consider similiarity between letters when one or the other is displaced or rotated or reflected or scaled (or some combination of these)
  - possibly also consider topological transformations (continuous deformations/homeomorphism, and possibly homotopy, although i think asking for non-homotopy is probably too high of a bar)
  - directly model visual cortex receptive fields ( mb see https://en.wikipedia.org/wiki/Orientation_column , http://www.scholarpedia.org/article/Area_V1 , pinwheels, gabor functions, convolutional neural nets, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1665021/#__sec2title , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1665021/#__sec2title http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0066990 http://www.pnas.org/content/101/43/15524.full https://web.archive.org/web/20180207192318/http://cs.unm.edu/~karlinjf/papers/stevens01.pdf ) and then do per-field similiarity comparison instead of per-pixel
- "Because there is also a visual aspect to dyslexia, affected children often show symptoms such as mirror letter reversal (e.g. confusing "b" and "d"), which can manifest in any language regardless of orthographic depth."
quick to write
quick to write, in cursive
quick to write, in a situation (like carving or like old-style alarm clocks) in which each letter should be made up of a small number of straight lines
does not require color
does not take up too much space
12 numerals (duodecimal)
only capitals

Notes:

Circles containing the Greek, Cyrillic and Latin alphabets, which share many of the same letters, although they have different pronunciations:

all shared: O T H P M A B X K Y E green and latin: N Z I cyrillic and latin: C latin only: Q J D G S F L U V W R greek and cyrillic: phi gamma tau

"The language with the most letters is Khmer (Cambodian), with 74 (including some without any current use). According to Guinness Book of World Records, 1995, the Khmer alphabet is the largest alphabet in the world. It consists of 33 consonants, 23 vowels and 12 independent vowels." -- [7]

"most of the alphabets in the old Khmer scripts are now ‘dead’, although some purists cling on to them with dear life (this goes to the other two languages to a lesser extent). Some 60–70 add old Khmer characters are effectively out of use now a days. If you look at the unicode, you will see there are 35 consonants (of even that 2 and arguably more are out of usage, again to my point of hard to kill purists). When you add the 15 stand alone vowels. This would total a 50 at the most." -- [8]

" Thai- 70

Malayalam- 58

Telugu 56

Sinhala- 54

Bangla- 52

Kannada-49

Hindi- 44

Hungarian- 44

Abkhaz - 41

Armenian- 39

Albanian - 36

Russian- 33

Azerbaijanian - 32

English- 26

Greek- 24

Hebrew- 22 " -- [9] but "most of scripts currently used in India, Nepal, Sri Lanka, Bhutan, Myanmar, Thailand, Laos and Cambodia are not alphabets, but abugidas." (Konstantin Beloturkin, same webpage)

ASCII has 94 printable characters (not including space, i think); subtracting lowercase characters, this leaves 68 (26 letters, 10 numerals, 32 symbols? Wikipedia gives slightly different numbers so my counts are probably slightly wrong here). "Some popular peripherals only implemented a 64-printing-character subset: Teletype Model 33 could not transmit "a" through "z" or five less-common symbols (` {

} and ~). and when they received such characters they instead printed "A" through "Z" (forced all caps) and five other mostly-similar symbols (@ [ \ ] and ^)." [10]

"The ASCII character set is barely large enough for general use, and far too small for universal use. Many more letters and symbols are desirable, useful, or required to directly represent letters of alphabets other than English, more kinds of punctuation and spacing, more mathematical operators and symbols (× ÷ · ≠ ≥ ≈ π etc.), some unique symbols used by some programming languages, ideograms, logograms, box-drawing characters, etc." [11]

"Modified variants of 7-bit ASCII appeared promptly, trading some lesser-used symbols for highly-desired symbols or letters, such as replacing "#" with "£" on UK Teletypes, "\" with "¥" in Japan or "₩" in Korea, etc. At least 29 variant sets resulted. 12 code points were modified by at least one modified set, leaving only 82 "invariant" codes.

There was always temptation to make a larger character set (preferably by extending a standard character set), but there was no easy path forward. There was no way to define a universal extension. There was no hardware to print or display an extended set. For years, applications were designed around the 64-character set and/or the 95-character set, so several characters acquired new uses. (For example, ASCII lacks "÷", so most programming languages use "/" (and sometimes "\") to indicate division, and most still do not recognize "÷". C interprets the two-character sequences "<%" and "%>" as equivalent to "{" and "}" because "{" and "}" are not available on all keyboards.)" [12]

Judging from [13], the 12 ASCII code points that were "modified by at least one modified set" were: # $ @ [ \ ] ^ ` {

} ~

"Eventually, ISO released this standard as ISO 8859 describing its own set of eight-bit ASCII extensions. The most popular is ISO 8859-1, also called ISO Latin 1, which contained characters sufficient for the most common Western European languages....Microsoft later created code page 1252, a compatible superset of ISO 8859-1 with extra characters in the ISO unused range. Code page 1252 is the standard character encoding of western European language versions of Microsoft Windows, including English versions. ISO 8859-1 is the common 8-bit character encoding used by the X Window System, and most Internet standards used it before Unicode....Because many Internet standards use ISO 8859-1, and because Microsoft Windows (using the code page 1252 superset of ISO 8859-1) is the dominant operating system for personal computers today, unannounced use of ISO 8859-1 is quite commonplace, and may generally be assumed unless there are indications otherwise. " [14]

"UTF-8 has been the dominant character encoding for the World Wide Web since 2009, and as of February 2018 accounts for 90.7% of all Web pages (many of which are simply ASCII, a subset of UTF-8; the next-most popular multibyte encodings, Shift JIS and GB 2312, have 0.8% and 0.6% respectively)." [15]

the additional characters in ISO IEC 8859-1 are: NBSP ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ SHY ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ -- [16]

DOS ( https://en.wikipedia.org/wiki/Code_page_437 ) had some other symbols:

NUL ☺ ☻ ♥ ♦ ♣ ♠ • ◘ ○ ◙ ♂ ♀ ♪ ♫ ☼ ► ◄ ↕ ‼ ¶ § ▬ ↨ ↑ ↓ → ← ∟ ↔ ▲ Ç ü é â ä à å ç ê ë è ï î ì Ä Å É æ Æ ô ö ò û ù ÿ Ö Ü ¢ £ ¥ ₧ ƒ á í ó ú ñ Ñ ª º ¿ ⌐ ¬ ½ ¼ ¡ « » ░ ▒ ▓ │ ┤ ╡ ╢ ╖ ╕ ╣ ║ ╗ ╝ ╜ ╛ ┐ └ ┴ ┬ ├ ─ ┼ ╞ ╟ ╚ ╔ ╩ ╦ ╠ ═ ╬ ╧ ╨ ╤ ╥ ╙ ╘ ╒ ╓ ╫ ╪ ┘ ┌ █ ▄ ▌ ▐ ▀ α ß Γ π Σ σ µ τ Φ Θ Ω δ ∞ φ ε ∩ ≡ ± ≥ ≤ ⌠ ⌡ ÷ ≈ ° ∙ · √ ⁿ ² ■

https://en.wikipedia.org/wiki/ISO/IEC_8859#Table shows the following characters (after sorting and uniquifying):

NBSP SHY LRM RLM

´ ¨ ÷ × ¬ ¦ ° µ ¯ ¡ ¿ · ¸ ‘ ’ “ ” « » § ¶ © ® ¤ ¢ £ ¥ € ₯ ± ، ؛ ؟ ـ ً ٌ ٍ َ ُ ِ ّ ْ ˘ ก ¼ ½ ¾ ¹ ² ³ á Á à À ă Ă â Â å Å ä Ä ã Ã ą Ą ā Ā ª æ Æ ḃ Ḃ ć Ć ĉ Ĉ č Č ċ Ċ ç Ç ď Ď ḋ Ḋ đ Đ ð Ð é É è È ê Ê ě Ě ë Ë ė Ė ę Ę ē Ē ḟ Ḟ ğ Ğ ĝ Ĝ ġ Ġ ģ Ģ ĥ Ĥ ħ Ħ í Í ì Ì î Î ï Ï ĩ Ĩ į Į ī Ī ı İ ĵ Ĵ ķ Ķ ĺ Ĺ ľ Ľ ł Ł ļ Ļ ṁ Ṁ ń Ń ň Ň ñ Ñ ņ Ņ ŋ Ŋ ó Ó ò Ò ô Ô ö Ö ő Ő õ Õ ø Ø ō Ō º œ Œ ṗ Ṗ ĸ ŕ Ŕ ř Ř ŗ Ŗ ś Ś ŝ Ŝ š Š ṡ Ṡ ş Ş ß ť Ť ṫ Ṫ ŧ Ŧ ţ Ţ ú Ú ù Ù ŭ Ŭ û Û ů Ů ü Ü ű Ű ũ Ũ ų Ų ū Ū ẃ Ẃ ẁ Ẁ ŵ Ŵ ẅ Ẅ ý Ý ỳ Ỳ ŷ Ŷ ÿ Ÿ ź Ź ž Ž ż Ż þ Þ ء آ أ إ ا ب ة ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع غ ف ق ك ل م ن ه ؤ و ي ى ئ א ב ג ד ה ו ז ח ט י ך כ ל ם מ ן נ ס ע ף פ ץ צ ק ר ש ת Α α Ά ά Β β Γ γ Δ δ Ε ε Έ έ Ζ ζ Η η Ή ή Θ θ Ι ι Ί ί Ϊ ϊ ΐ Κ κ Λ λ Μ μ Ν ν Ξ ξ Ο ο Ό ό Π π Ρ Σ σ ς Τ τ Υ υ Ύ ύ Ϋ ϋ ΰ Φ φ Χ χ Ψ ψ Ω ω Ώ ώ а А б Б в В г Г д Д ѓ Ѓ ђ Ђ е Е є Є ё Ё ж Ж з З ѕ Ѕ и И і І ї Ї й Й ј Ј к К л Л љ Љ м М н Н њ Њ о О п П р Р с С т Т ќ Ќ ћ Ћ у У ў Ў ф Ф х Х ц Ц ч Ч џ Џ ш Ш щ Щ ъ Ъ ы Ы ь Ь э Э ю Ю я Я

character vs glyph vs grapheme: a character is what the codepoints in unicode are (font-independent). Characters can be not only letters but also numbers, punctuation, etc. A glyph is a particular visual representation of a character; a font is (mostly) a character -> glyph mapping. A grapheme is the analog of a character within some language's writing system ("orthography"); for example, in Spanish, “ch” functions as a single unit, so that is one grapheme. [17]

"Perhaps the primary graphic distinction made in classifications is that of linearity. Linear writing systems are those in which the characters are composed of lines, such as the Latin alphabet and Chinese characters. Chinese characters are considered linear whether they are written with a ball-point pen or a calligraphic brush, or cast in bronze. Similarly, Egyptian hieroglyphs and Maya glyphs were often painted in linear outline form, but in formal contexts they were carved in bas-relief. The earliest examples of writing are linear: the Sumerian script of c. 3300 BC was linear, though its cuneiform descendants were not. Non-linear systems, on the other hand, such as braille, are not composed of lines, no matter what instrument is used to write them.

Cuneiform was probably the earliest non-linear writing. Its glyphs were formed by pressing the end of a reed stylus into moist clay, not by tracing lines in the clay with the stylus as had been done previously. The result was a radical transformation of the appearance of the script.

Braille is a non-linear adaptation of the Latin alphabet that completely abandoned the Latin forms. The letters are composed of raised bumps on the writing substrate, which can be leather (Louis Braille's original material), stiff paper, plastic or metal." [18]

" In trying to develop universally interchangeable character encodings, researchers in the 1980s faced the dilemma that on the one hand, it seemed necessary to add more bits to accommodate additional characters, but on the other hand, for the users of the relatively small character set of the Latin alphabet (who still constituted the majority of computer users), those additional bits were a colossal waste of then-scarce and expensive computing resources (as they would always be zeroed out for such users).

The compromise solution that was eventually found and developed into Unicode was to break the assumption (dating back to telegraph codes) that each character should always directly correspond to a particular sequence of bits. Instead, characters would first be mapped to a universal intermediate representation in the form of abstract numbers called code points. Code points would then be represented in a variety of ways and with various default numbers of bits per character (code units) depending on context. To encode code points higher than the length of the code unit, such as above 256 for 8-bit units, the solution was to implement variable-width encodings where an escape sequence would signal that subsequent bits should be parsed as a higher code point. " [19]

"Examples of characters include letters, numerical digits, common punctuation marks (such as "." or "-"), and whitespace." [20]

"Most IAL writing systems use only letters from the ISO basic Latin alphabet." [21]

"Combatting Dyslexia with typography...In 2006, Dr. Robert Hillier...type families Sylexiad and Dine...relatively light letter strokes; relatively long ascender and descenders; upper-case characters; generous spacing; clear differentiation between characters; availability of both serif and sans-serif versions. In 2008, ... Christian Boer...Dyslexie...this common cognitive processing disability may cause some of those who have it to perceive glyphs rotated, ...flipped,...letters were given heavier stroke weights at their baselines than at their tops, counter shapes were varied asymmetrically, some letters' stems were tipped, ascender and descender heights were varied, and so were counter sizes. Glyphs were designed with tall x-heights and wide side-bearings to add negative space and avoid crowding... capital letters and punctuation marks were emboldened...2011...Abelardo Gonzalez...open source...Open Dyslexic...bottom-heavy glyphs, wide letterspacing, unique letter shapes...research...at the University of Twente showed Dyslexie to improve reading...Other research, however, has shown that familiarity with a given typeface may have just as strong of an impact on reading comprehension as glyph or word shape" [22]

" Equivalence classes of the English (i.e., Latin) alphabet (sans-serif): Homeomorphism {A,R} {B} {C,G,I,J,L,M,N,S,U,V,W,Z}, {D,O} {E,F,T,Y} {H,K}, {P,Q} {X}

Homotopy equivalence {A,R,D,O,P,Q} {B}, {C,E,F,G,H,I,J,K,L,M,N,S,T,U,V,W,X,Y,Z}

An introductory exercise is to classify the uppercase letters of the English alphabet according to homeomorphism and homotopy equivalence. The result depends partially on the font used. The figures use the sans-serif Myriad font. Homotopy equivalence is a rougher relationship than homeomorphism; a homotopy equivalence class can contain several homeomorphism classes. The simple case of homotopy equivalence described above can be used here to show two letters are homotopy equivalent. For example, O fits inside P and the tail of the P can be squished to the "hole" part.

Homeomorphism classes are:

    no holes corresponding with C, G, I, J, L, M, N, S, U, V, W, and Z;
    no holes and three tails corresponding with E, F, T, and Y [clarification needed];
    no holes and four tails corresponding with X;
    one hole and no tail corresponding with D and O;
    one hole and one tail corresponding with P and Q;
    one hole and two tails corresponding with A and R;
    two holes and no tail corresponding with B; and
    a bar with four tails corresponding with H and K; the "bar" on the K is almost too short to see.

Homotopy classes are larger, because the tails can be squished down to a point. They are:

    one hole,
    two holes, and
    no holes.

To classify the letters correctly, we must show that two letters in the same class are equivalent and two letters in different classes are not equivalent. In the case of homeomorphism, this can be done by selecting points and showing their removal disconnects the letters differently. For example, X and Y are not homeomorphic because removing the center point of the X leaves four pieces; whatever point in Y corresponds to this point, its removal can leave at most three pieces. The case of homotopy equivalence is harder and requires a more elaborate argument showing an algebraic invariant, such as the fundamental group, is different on the supposedly differing classes.

Letter topology has practical relevance in stencil typography. For instance, Braggadocio font stencils are made of one connected piece of material. " -- [23]

unicode character frequencies: http://unicode.org/L2/L2009/09180-char-frequency.pdf

arabic letter frequencies: https://en.wikipedia.org/wiki/Arabic_letter_frequency

dendrogram of 1-gram similiarites in writing systems: Approaches to Measuring Linguistic Differences edited by Lars Borin, Anju Saxena: Predicting language-learning difficulty by michael cysouw

or http://cysouw.de/home/articles_files/cysouwPREDICTING.pdf

in the previous, the guys with multiple languages clustered together are:

latin
cyrillic
arabic
devanagari (hindi)
chinese

incidentally, that paper is interesting in its own right, although not relevant here. Its conclusions is that the following objective factors are most predictive of language learning difficulty (for English speakers):

number of similar WALS attributes between two languages divided by the number of WALS attributes compared (which is 139, because the sign language and writing system attributes in WALS are left out)
Orthographic similarity, which appears to mean the similarity coefficient across character frequencies (expressed as unicode characters so that languages with some of the same characters is what is counted here) (the corpus used is translations of the Universal Declaration of Human Rights)

Does the language have a Latin script? If not, does it have an Indo-European script? Prepositions (WALS 85 "Order of adposition and noun phrase” (Dryer 2005c); English has "prepositions precede the noun phrase they occur with"; but a lot of other languages instead have prepositions (called 'postpositions') following their phrase [24]) Accusative (WALS 100 "Alignment of Verbal Person Marking" "English uses accusative alignment, i.e. the intransitive subject and the transitive subject trigger the same inflection. This is the most widespread strategy from a world-wide perspective. Other approaches, like ergative or active alignment, make a language more difficult to learn for English speakers") Nominal plural (WALS 34 English has value "All nouns, always obligatory", which is the most common across languages; difficulty is when "nouns with a plural meaning are not always obligatorily marked as such" (parameter 34, (Haspelmath 2005c)

WALS parameters as discussed above with reference to figure 2, i.e. parameters 93, 116, 100, 101, 52, 64, 85, 90, 94, 27, and 34.6

some conscripts:

on beyond zebra alphabet:

misc

http://appliedeconomist.blogspot.com/2012/03/language-and-savings.html : "languages which do not grammatically distinguish between present and future events (what linguists call weak-FTR languages) lead their speakers to take more future-oriented actions"

http://appliedeconomist.blogspot.com/2014/01/gdp-per-capita-by-language.html

https://en.wikipedia.org/wiki/Pasigraphy

https://en.wikipedia.org/wiki/Template:Constructed_languages