10. Pivakes xaraktnrwv, character sets
O H/U gia va parastnsei ta glwssika sumbola xrnsimopoiei 1byte=8bit, dnladn exoume 2^8=256 diaforetika grammata. O kwdikas ASCII (American Standard Code for Information Interchange) orizei austnra movo ta prwta 128 sumbola (7bit). Ta alla misa sumbola xrnsimopoiouvtai gia apeikovisn eidikwv sumbolwv allwv glwsswv alla kai grafikwv sumbolwv. Avti8eta, me alles eurwpaikes glwsses ta ellnvika eivai e3' oloklnrou sta 8bit. O profavns logos eivai ta polla diaforetika sumbola pou exei n ellnvikn glwssa se sxesn me tis upoloipes.
Pros8etes plnrofories gia ta ellnvika sto Diadiktuo, mporeite va breite sto RFC 1947, "Greek Character Encoding for Electronic Mail Messages". Deite sxetika http://andrew2.andrew.cmu.edu/rfc/rfc1947.html
10.1 Tupopoinseis twv ellnvikwv
Ta ellnvika uparxouv se polles diaforetikes tupopoinseis. Oi pio suvn8ismeves apo autes, eivai ta 737 kai ta 928. Amfotera eivai gia movotovika ellnvika. Ta mev 737 xrnsimopoiouvtai apo to DOS, ta de 928 apo ola ta UNIX kai Windows (me mikres parallages). To Linux exei sav kuria kwdikoselida ta 928. To oti exoume gia Ellnvika duo kai pleov protupa, fusika, eivai megalo problnma, pou 3epervietai me eidikous metatropeis, gia allagn apo to eva set sto allo.
Apo tnv tekmnriwsn tns Oracle gia to Linux kai ta egxeiridia tou server, mporei kaveis va brei ta diadedomeva ellnvika protupa pou xrnsimopoiouvtai se baseis dedomevwv (ara kai sta pio snmavtika sustnmata H/U) kai tous tupopoinmevous (ma pali;) kwdikous tous:
- EL8ISO8859P7, Dnl. to ISO8859-7 gia to UNIX & Internet. Eivai gvwsto kai ws 928 & Latin7.
- EL8MSWIN1253, Ta Windows ellnvika
- EL8PC737, Ta DOS ellnvika
- EL8MACGREEK, O Macintosh xrnsimopoiei ELOT-823 (IBM 851).
- EL8MACGREEKS
- EL8PC437S
- EEC8EUROPA3
- EL8EBCDIC875, IBM mainframes greek character set
- EL8DEC, Auta eivai logika ta DEC, VAX/VMS ellnvika. (kaveis palios???)
10.2 737
Ta 737 eivai episns gvwsta kai ws 437G (=437Greek), giati proekuyav apo tropopoinsn twv amerikavikwv 437. Ta 737 prwtoemfavistnkav stis ellnvikes EPROM twv MDA kai Hercules kartwv grafikwv twv prwtwv PC, opou briskovtav dnladn sto HARDWARE. Xrnsimopoin8nkav kata korov sto DOS, kai gia auto ola ta arxeia pou proerxovtai apo ekei avamevetai va eivai 737. Epeidn ta 737 8ewrouvtai pleov kataloipo tou DOS, eivai kalutera va metatreyete ta arxeia pou eivai 737 se 928, bl. convertgreek . Sto Linux, n kwdikoselida 737 upostnrizetai plnrws movo stnv kovsola (text-mode), alla uparxouv kai merikes grammatoseires gia X-Windows.
Tropopoinsn purnva gia upostnri3n 737
Exouv avafer8ei periptwseis, opou to "d" (DELTA mikro) dev plnktrologeitai se kapoious purnves kai auto sumbaivei giati sumpiptei me to 128+ESC (128+27=155=asc("d")). Pngaivete sto /usr/src/linux/drivers/char/console.c, kapou leei:
&& (c != 127 || disp_ctrl) && (c != 128+27); alla3te se && (c != 127 || disp_ctrl) /* && (c != 128+27)*/;kai kavete compile eva veo purnva.
737 se X-windows
Ta 737 upostnrizovtai se merikes apo tis fixed grammatoseires pou eivai sto paketo Grafis: graphis .
[ah@computer.org]'s report for names (from xlsfonts): -misc-grfixed-medium-r-normal--0-0-75-75-c-0-grpc-737 -misc-grfixed-medium-r-normal--0-0-85-85-m-0-grpc-737 -misc-grfixed-medium-r-normal--14-110-75-75-c-75-grpc-737 -misc-grfixed-medium-r-normal--16-120-75-75-c-75-grpc-737 -misc-grfixed-medium-r-normal--23-179-85-85-m-120-grpc-737 -misc-grfixed-medium-r-semicondensed--0-0-75-75-c-0-grpc-737 -misc-grfixed-medium-r-semicondensed--10-100-75-75-c-60-grpc-737 -misc-grfixed-medium-r-semicondensed--13-120-75-75-c-60-grpc-737 -misc-grvga-medium-r-normal--0-0-75-75-c-0-grpc-737 -misc-grvga-medium-r-normal--13-120-75-75-c-60-grpc-737 (nomizw kapoia exoyn bugs kai exw skopo na ta diorthwsw se next release).
10.3 928
Ta ellnvika 928 eivai n pio sugxrovn kai diadedomevn tupopoinsn kai ka8ierw8nke arxika apo tov ELOT. Argotera egivav apodekta kai apo tov ISO ws ISO-Latin-8859-7, n apla Latin7, akoma kai n UNICODE upostnri3n ellnvikwv basizetai se auta. Ta 928 xrnsimopoiouvtai se oles tis efarmoges twv UNIX, sto Internet kai apotelouv to snmerivo protupo kai gia to Linux. To protupo 928 upostnrizetai, kai stnv kovsola (text-mode), kai se grafiko periballov (X-Windows).
Windows-1253
H kuria apoklisn twv Windows ellnvikwv (Windows-1253) apo tnv tupopoinsn ELOT 928, eivai o xaraktnras "A", (A tovoumevo) tou 928 o opoios sta Windows avtistoixei sto Paragraph mark. Apo ta Windows-1253 leipouv episns n avw teleia, kai ta ellnvika omoiwmatika << kai >>. Epeidn moiraia 8a prepei va apodextoume tov periorismo auto pou mas 8etouv ta MS-Windows, kai epeidn arketoi xrnstes xrnsimopoiouv wintel platforma ergasias, kalo 8a eivai va apofeugetai to < A tovoumevo > kata tnv apostoln e-mails, postings, klp. Evallaktika mporeite va xrnsimopoieite to 'A ( ' = SHIFT+" ) Paromoia problnmata uparxouv kai me ta 'E kai 'O. Gia eukolia sas, auta eivai ola ta tovoumeva kata 928: AEHIOYO.
10.4 Unicode
Ta UNICODE (ISO 10646) eivai 16bit (dnl. 65536 suvdiasmoi) kai perilambavouv polles glwsses, mazi me ta vea ellnvika, pou exouv offset #370 kai ta arxaia ellnvika me offset #1F00. Upostnrizovtai apo ta vea mexri ta arxaia (polutovika) ellnvika kai Grammikn B! To Linux upostnrizei eswterika ta UNICODE, alla akoma n xrnsn tous dev eivai diadedomevn, giati e3artatai kai apo tnv uio8etnsn tous apo tis efarmoges. Gia perissotera deite: http://linuxdoc.org/HOWTO/Unicode-HOWTO.html
==================================================================== Vasilis Vasaitis <vas@hal.csd.auth.gr>: Av kai dev exw asxoln8ei ektevws me to avtikeimevo, mporw va suveisferw kapoia gvwsn pou exw epi tou 8ematos. Loipov, exoume kai leme: Kapoia stigmn, se avupopto xrovo, eixa katebasei eva Unicode fixed font gia ta X windows. Epeidn duskola sbnvw auta pou katebazw, to brnka va ka8etai akoma sto disko mou. H grammatoseira autn dev periexei to plnres Unicode, afou auto apoteleitai apo perissoterous apo 38000 xaraktnres, apo tous opoious oi perissoteroi eivai Kivezika/Iapwvika/Koreatika, pou etsi ki alliws sto 6x13 tou fixed dev mpaivouv. Omws me peripou 2800 xaraktnres (n ekdosn pou exw egw toulaxistov) kaluptei plnrws tnv lativikn, ellnvikn, kurillikn, armevikn, gewrgiavn kai ebraikn grafn, suv kapoia texvika kai ma8nmatika sumbola. H grammatoseira autn mporei va xrnsimopoin8ei ws protupo apo opoiovdnpote evdiaferetai va sxediasei grammatoseires me pollous xaraktnres sxetika me pio praktikes efarmoges, deite parakatw. H selida tou tupou pou tnv eftia3e, av eivai akoma n idia, eivai: http://www.cl.cam.ac.uk/~mgk25/ Upostnri3n stnv kovsola: H kovsola upostnrizei Unicode edw kai kati aiwves, mesw bebaia tou UTF8 (gia osous dev 3erouv, to UTF8 eivai mia avaparastasn tou UniCode me metablnto mege8os, to opoio gia gia tous 128 prwtous xaraktnres exei tnv idia morfn me to ASCII). To 8ema eivai oti etsi ki alliws n upostnri3n tns VGA gia xaraktnres pou emfavizovtai sugxrovws eivai polu periorismevn (256, 512 xwris to avabosbnma). Upostnri3n sta X: H grammatoseira pou avaferw parapavw douleuei mia xara, kai n teleutaia fora pou tn dokimasa ntav priv polu kairo. Episns, tuxaivei va exw evav X server me evswmatwmevn upostnri3n TrueType fonts (dev fortwvw font server), kai blepw oti kai ta TrueType douleuouv mia xara. Gia osous dev 3erouv, ta XFree86 4.0 8a erxovtai me evswmatwmevn upostnri3n TrueType. H Microsoft (dev exw apo alln etaireia) xrnsimopoiei stis grammatoseires tns to Windows Glyph List 4 (WGL4), to opoio eivai uposuvolo tou ISO 10646-1 (ligo polu auto pou exei n grammatoseira pou periegraya arxika). Efarmoges: Edw katarreouv ola. Autn tn stigmn uparxouv kava duo programmata pou kavouv metatropn apo/pros UTF8, to yudit kai to Netscape pou mazeuouv apo edw ki apo ekei gia va brouv arketa sumbola tou Unicode, kai apo ekei kai pera to xaos. Pavtws kalov 8a eivai va arxisei prospa8eia gia ta fonts, kai favtazomai oti oi efarmoges 8a koita3ouv va akolou8nsouv. --------------- Report apo Panagioti Vrioni: Gnwrizw oti o Giannis Gyftomitros <yang@hellug.gr> exei hdh arxisei na asxoleitai me thn dunatothta dhmiourgias Unicode grammatoseirwn pou na periexoun kai ta ellhhnika (Project Grafis, bl. GRArial k.l.), isws na exei proxwrhsei kai parapera... Apo thn ekdosh 6.0, o XFS pou periexetai sto Red Hat exei patch wste na mporei na emfanisei Trye Type Fonts. Bl. sxetiko "White Paper" stho "support" ths http://www.redhat.com/ . An balete Unicode TTFonts (px. ths M$) auta paizoun, me thn ennoia oti fainontai dia8esima ta fonts me xilia-duo diaforetika encodings. Den kserw omws an paizoun kai san unicode grammatoseira, px. gia na dei kapoios ena keimeno me ellhnika, agglika kai kinezika tautoxrona sto Netscape. =====================================================================
Unicode Links
Uparxei mia fixed grammatoseira gia Xwindows, deite sxetika: http://www.cl.cam.ac.uk/~mgk25/ucs-fonts.html
Uparxei kai evas text editor gia Unicode, me to ovoma Yudit, ftp://metalab.unc.edu/pub/Linux/apps/editors/X/yudit-1.1.tar.gz
To protupo UTF-8 eivai pleov standard sto Internet, deite to sxetiko RFC: http://andrew2.andrew.cmu.edu/rfc/rfc2279.html
Perissotera gia ta vea ellnvika sta Unicode edw: http://charts.unicode.org/Unicode.charts/normal/U0370.html
10.5 Metatropeis ellnvikwv
gr2gr
O Aggelos Xaritsns < ah@computer.org> exei grayei tov metatropea autov: ftp://ftp.hri.org/pub/greek/programs/gr2gr.prl Trexei me perl (5 n 4). Suvepws douleuei se opoio leitourgiko sustnma exei egkatasta8ei perl (unix, dos, win32, os2, mac, vms ...).
Upostnrizei polla diaforetika ellnvika, opws:
- 928: ELOT 928
- 437: IBM 437 (*default* input)
- lat: latin greek form, aka greeklish (*default* output)
- 437b: IBM 437b
- win: Windows
- mac: Macintosh
- 851: 851
- 869: 869
- quad: quadtek
- sym: standard Symbol font codes (English garbled)
- wgr: WinGreek (Windows shareware prg) encoding
- troff: troff symbol font escape sequences, no diacritics (dialytika)
- kdtex: Dryllerakis TeX (only conversions _to_ kdtex work)
- ibytex: ibygrk TeX (only conversions _to_ ibygrk work)
grfilter
Sto Ivstitouto Texvologias Upologistwv uparxei to grfilter: ftp://ftp.cti.gr/pub/src/grfilter.tar
greek2lat
Sto directory ftp://corfu.forthnet.gr/pub/greek2lat uparxei evas metatropeas apo 928 se greeklish, katallnlos kai gia WEB sites.
trans120.tar.gz
O Kwstas Kwstns, < kosta@kostis.net > exei grayei episns autov tov metatropea, pou upostnrizei kai polla ellnvika, alla kai alles glwsses: http://www.kostis.net/freeware/trans120.tar.gz
gkconv
Uparxei kai eva programma tou Giwrgou Spnliwtn, metatrepei 437, Win95, X win. H dieu8uvsn tou agvoeitai.
recode
Auto eivai eva programmataki gevikns xrnsns apo to GNU project, to opoio upostnrizei metatropeis gia polles diaforetikes glwsses (kai ellnvika). Isws 8a eprepe ola ta upoloipa programmata kapoia stigmn va evswmatw8ouv se auto. Deite stnv dieu8uvsn http://www.delorie.com/gnu/docs/recode/recode_toc.html
10.6 Tupoi arxeiwv kai metatropn tous
- .txt, .doc
Avaloga me tnv periptwsn, blepe convertgreek
- .dbf
Suvn8ws eivai 737, 8elouv prosoxn stnv metatropn, afnste to gia kava guru.
- .diz,
Suvn8ws eivai 737, blepe convertgreek
- .html,
Prepei va eivai 928, kai faivovtai kavovika.
- .mov, .avi
Av exei upotitlous sta ellnvika, 8a eivai OK :-)
- .exe, .com
peta3te ta
10.7 Ti uparxei akoma sto Internet sxetika me ellnvika;
Xrnsimoi suvdesmoi:
- Ellnviko Fortune: http://kronos.eng.auth.gr/~arvan/fortunes/
- Virgo help on greek: http://www.virgo.gr/baza/greek.html
- Hellenic Resources Institute (HRI): http://www.hri.org
- Font pivakes sto HRI: http://www.hri.org/fonts/unix/pinakec.html
- fonts@argeas: ftp://argeas.hellug.gr/pub/unix/linux/GREEK/fonts/
- fonts@HRI: ftp://ftp.hri.org/pub/greek/fonts/x-win/
- Selida tou Pavagiwtn Bruwvn gia grammatoseires: http://users.hol.gr/~vrypan/cactus/grfonts-1.html
- Unix Greek Language Software (old): http://www.cs.columbia.edu/~akonstan/en/greek/software.html
- Unicode Organization: http://www.unicode.org
- I18N FAQ: http://www.vlsivie.tuwien.ac.at/mike/i18n.html
- ISO fonts: ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/ISO-fonts
- International fonts: ftp://ftp.vlsivie.tuwien.ac.at/pub/8bit/i18n-fonts
- Using 8 bit characters: ftp://ftp.ulg.ac.be/pub/docs/iso8859/
- ISO-8859 sets: http://wwwwbs.cs.tu-berlin.de/~czyborra/charsets/
- Much charactersets info (kermit?): ftp://kermit.columbia.edu/kermit/charsets/
- http://www.ora.com/homepages/comp.fonts/
Next Previous Contents