Translation to other languages?

Discussion about the project in general, organization, website, or any other details that aren't directly about the game.
Message
Author
Blade Runner
Space Squid
Posts: 53
Joined: Fri Sep 05, 2003 8:47 pm

#31 Post by Blade Runner »

IMHO we have to abandon the idea of some kind of automatic translating for the game. The only way is to build a text database, and translate every text on every ocassion.
I think with one example I can show, how impossible to translate (with some kind of automatism) from English to Hungarian:
The english sentence:
We bring some dinner for your dogs.
Translated to Hungarian:
A kutyáitoknak hoztunk egy kis vacsorát.
First of all the starting We diasppear because the verb contain the person in:
So the starting „We bring” turn to: „hoztunk” and go to the middle of the Hungarian sentence.
In the Hungarian language we have a so called object mode, wich mean we can say in our sentences just with endings if the sentence object is something unique or not. The translator must know what is the object to translate this accuratelly, because the english sentence cannot contain that kind of information. It going into the end of the word dinner -> „vacsora”, wich turn to be „vacsorát”. :twisted:
And finnaly the word „kutyáitoknak” means literaly: „for your dogs” in reverse order, put on the end of the basic word (dog means kutya) and some change of the vowels wich is also working with a quite difficult rule system.
The easist part of the procedure is to translate the word "some" to "kis" and put it before the word "vacsorát" (means "dinner").
All this hassel with a short basic sentence! :D
Last edited by Blade Runner on Tue Jan 11, 2005 7:11 am, edited 1 time in total.
-------------------------------------------
Te vagy a Blade Runner. :)

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#32 Post by Geoff the Medio »

I think the suggestion for search and replace translation was meant only for use between american and british english, in which case the only changes are (assumed to be) spelling, which are simple enough to do with search and replace methods.

Blade Runner
Space Squid
Posts: 53
Joined: Fri Sep 05, 2003 8:47 pm

#33 Post by Blade Runner »

Geoff the Medio wrote:I think the suggestion for search and replace translation was meant only for use between american and british english, in which case the only changes are (assumed to be) spelling, which are simple enough to do with search and replace methods.
If that is the case I am happy, but ie. SMAC cannot translated into Hungarian. I checked the translation tables and it was impossible to do a proper job, because of the gender part (we dont have) and the word order (we have a diferent one) and of course the so called "object mode" . :)
-------------------------------------------
Te vagy a Blade Runner. :)

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#34 Post by Geoff the Medio »

Blade Runner wrote:If that is the case I am happy, but ie. SMAC cannot translated into Hungarian. I checked the translation tables and it was impossible to do a proper job, because of the gender part (we dont have) and the word order (we have a diferent one) and of course the so called "object mode" . :)
I don't quite follow what you're saying there...

What is the problem with word order? Obviously different languages have different orderings of equivalent words, but why can't you just write the translation in the appropriate order?

I'm not sure what object mode is... Is it a modification of the object noun of the sentence according to context, similar to gender or conjugation for pronouns and verbs? Edit: Er, looks like it's more like a modification of the verb based on the type of object...? /Edit

Do you mean you couldn't translate SMAC because Hungarian doesn't have "the gender part", or because Hungarian does, and the file you were translating from didn't? What exactly is "the gender part"? (see below for possible answer...)

I can imagine where gender issues for verbs, articles and pronouns and such could be problematic in making up stringtables... something like:

REQUEST_TEXT
Please give me the %1%.

Where "the" would have a different translations for languages with gendered or pluralized articles (eg. french: le, la ou les), depending on %1%, whereas no such amgiuity exists in english.

This could be resolved by having the %1% be replaced by (for example) "the klingons" instead of just "klingons" or "klingon empire", though I suppose it might be impossible to do this if other strings referred to the same string, but weren't set up in such a way that the "the" could be included in the translated version of the appropriate string (in a different one from the english version which doesn't have the problem)...

A related problem in english, and presumably other languages, would be a string like:

OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give him/her/it the technology %3%.

In which the pronoun him, her or it would need to be selected based on the gender of %2%.

I imagine that a method to solve this problem in english would be applicable to most similar gender problems in other languages. Perhaps this should be discussed with tzlaine... Simple stringtables may be insufficient, and this couldn't be solved by including the varying part in the string string replacing %2%, since the the varying part is not adjacent to the noun itself.

And by "him/her/it", I point out that we should have a neutral gender, for both aliens without genders, and because, AFAIK, some human languages have neutral gendered nouns as well...
Last edited by Geoff the Medio on Tue Jan 11, 2005 10:24 pm, edited 1 time in total.

noelte
Juggernaut
Posts: 872
Joined: Fri Dec 26, 2003 12:42 pm
Location: Germany, Berlin

#35 Post by noelte »

@Geoff
I guess Blade Runner was talking about how translation is done within SMAC, not within fo. I think he mentioned important issues, but i think everythink can be handled in the way we do it already. (not sure about the gender part). Unless we need something like

OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give (switch(gender) case male:him;break;case female:her;default:it;) the technology %3%.

this should be done programmaticly (as you said)
OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give %4 the technology %3%.
Press any key to continue or any other key to cancel.
Can COWs fly?

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#36 Post by Geoff the Medio »

noelte wrote:@Geoff
I guess Blade Runner was talking about how translation is done within SMAC, not within fo.
He was, but similar issues will probably arise with FO as he had with SMAC...
OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give %4% the technology %3%.
Do you know how text such as the above is treated in code?

I would expect that the various labels like "FW_FLEET_MOVING_TO" that appear in eng_stringtable.txt are also hard-coded into the source code, and that the number of "blanks" to be filled in (the %1% things) are also hard-coded, and the number that appear in the stringtable entry have to match the number that are expected for that label in code. Is this accurate?

If so, this means that translators such as Blade Runner can't alter the number of parameters in a string, so can't add extra options for gender altered pronouns or articles that are necessary in the language being translated into, if they aren't already in there because they're necessary in english...

This means that, in order to be translatable to any language, we'd have to have enough %1% entries in each string to cover all possible variations... which is probably way too many, since they are quite specific... eg. %1% is the appropriate him/her/it, and can't be general purpose in any way.

So, based on that, it seems like some built in logic in the stringtable entries would be beneficial... so as to pick between, for example, him, her or it, based on the gender of the noun... and so as to be able to deal with any similar issues in other languages that the programmers or english language stringtable entry writers can't anticipate, since they're not fluent in every known language... (Edit: and even if they were, different grammars in different languages might require types of articles or pronouns in different ways, which could be problematic...)

noelte
Juggernaut
Posts: 872
Joined: Fri Dec 26, 2003 12:42 pm
Location: Germany, Berlin

#37 Post by noelte »

Hmm, i hadn't much to do with those strings yet, so i'm not sure, if i got everything right, but zach will contradict me if so ;-)

OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give %4% the technology %3%.

1 - every translation of OTHER_REQUEST_TEXT have to match exactly the number of parameters (here 4) (less than 4 parameters won't hurt, but more would)
2 - you can reorder of the position of any parameter, but for instance %2% will always be a placeholder for leader-name.
3 - you can't add any new parameter (how should the compiler know them)


I would like to change it to match our requirements. For instance in a way that you don't have to use all parameters. For instance

OTHER_REQUEST_TEXT
Please give %4% the technology %3%.

any thoughts?
Press any key to continue or any other key to cancel.
Can COWs fly?

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#38 Post by Geoff the Medio »

noelte wrote:any thoughts?
Being able to omit parameters that you don't need is helpful, but it still could pose a problem for languages that need more parameters due to differing grammars...

Another problem is contractions... depending on whether the first letter of something is a vowel, for instance, often changes the preceeding article in french... eg. l'arbre vs. le livre. I'm not sure how often this would be an issue... and it could be resolved by having a form of the word with the article available.

Perhaps we should have a some SMAC-like properties of objects for strings that might need them... SMAC records the gender of leaders, for example... and the stringtables are written for each case, I think.

We could also have a bigger set of stringtable entries... so that there are articles and some conjugations of a limited set of verbs corresponding to the leader... though this probably isn't practial to solve all the issues that might crop up... We'd also need parameter types defined that aren't necessary in english, but which would be necessary in other languages (eg the object mode thing Blade Runner mentioned). This will be difficult to do completely enough for all possible languages, though...

We may also need some way to pick the between text options based on properties of a parameter value... So if a parameter in the string is the object of the sentence, and is a "unique" object, the proper conjugation (or somesuch) of the verb in the sentence is selected from options in the stringtable entry. This is rather complicated, in that the parameter value itself doesn't naturally contain this information in an obvious form... so we'd need some sort of markup in the parameter values to indicate to the text generator which version of a conjugation to use...

Perhaps we could use some XML conditions in the stringtables? For example (excuse the format if it's unreasonable... I think you get the idea):

Code: Select all

STRINGTABLE_ENTRY
  <StringtableEntry>
    <Text>I like </Text>
    <Parameter>PARAM1</Parameter>
    <Text>.  </Text>
    <Condition::NounType>
      <Parameter>PARAM1</Parameter>
      <NounType>SINGULAR_FEMALE</NounType>
      <Text>She is </Text>
    </Condition::NounType>
    <Condition::NounType>
      <Parameter>PARAM1</Parameter>
      <NounType>PLURAL</NounType>
      <Text>They are </Text>
    </Condition::NounType>
    <Text>nice.</Text>
  </StringtableEntry>
The PARAM1 value would be another stringtable entry, such as:

Code: Select all

OTHER_STRINGTABLE_ENTRY
  <StringtableEntry>
    <Text>The Klingons</Text>
    <NounType>PLURAL</NounType>
  </StringtableEntry>
Then in game code, STRINGTABLE_ENTRY would be displayed, and if it was passed OTHER_STRINGTABLE_ENTRY, it would display "I like The Klingons. They are nice."

Ideally, the NounTypes could be defined outside of code, like the tech categories are... except they wouldn't be stringtable entries... they'd just be extra info attached to each string that another string could check a parameter for, and alter its form to compensate... you could then define as many as you want in whatever language you're working in, making your strings as variable or as fixed as necessary to have correct grammar, whether or not that info is needed in english or any other language... I assume this would cover the object mode stuff in Hungarian... (comments Blade Runner?) Maybe "NounType" is too specific... we could just have a generic "Grammar" field in which any relevant info about the string could be indicated by any arbitrary marker that could be checked for by other strings, as in the example.

This would still be limited by the number of parameters being hard-coded for each stringtable entry, but presumably if we can add any amount of info to the strings for each language, and any number of checks inside a string for info associated with the parameters to that string, then the stringtable entries should be language-grammer independent...?

Getix
Space Floater
Posts: 29
Joined: Sun Jan 02, 2005 9:43 pm
Location: Italy

#39 Post by Getix »

Blade Runner wrote:
I have to disagree a bit. German sentences are quite long, but Hungarian sntences are usually longer. :D
I am a native Hungarian, so I can do the translation. If the Hungarian text will fit, I think the rest of the European languages will fit (included German).
Just a few example:
csillag rendszer = star system
csillag = star
idegen lények = aliens
etc. :)
(There are a few short words obviously, like:
bolygó = planet
gép = machine
hajó = ship)
Ahem

Sistema Solare = Star System,
Stella = star
Alieni = Aliens
Pianeta = Planet
Macchina = Machine (*)
Nave = Ship

(*) In italian "Macchina" means also Car or Industrial Machine ...
Getix "The Cromist", (20, 90, Italy, MI)
FIAT CROMA CHT (called Laura) Acrobatic Driver - 32,5 kKm/243 KKm
"A Croma is Forever"

Getix
Space Floater
Posts: 29
Joined: Sun Jan 02, 2005 9:43 pm
Location: Italy

Re: Romance Languages

#40 Post by Getix »

Black_Dawn wrote:
The "european languages" shouldn't be to problematic.
You would be suprised. German (and related languages) should in fact be one the easier languages to do, because the sentence structure of German is very similar to English (it's not called anglo-SAXON for nothing). Romance languages on the other hand tend to reverse certain sentence elements. A direct translation of the the words "hot dog" into French become "chien chaud" ("dog hot") for example.

Add to this the fact that objects in romance languages are spoken of in either "masculine" or "feminine" and that word structure changes accordingly. For example, in the phrase "La table est grosse" (the table is large), table is feminine, so le becomes la and we add se to the end of gros. Of the major romance languages, French is the most complicated (what with the antiquated accent and grammer system). I would suggest translating your game into French first, and then from French to the other Romance languages (Italian, Spanish, Portugese, etc.)

Trust me, italian is MUCH more difficult than french... It is better tranlsating from Eng -> ITA and ENG -> FRE

:)
Getix "The Cromist", (20, 90, Italy, MI)
FIAT CROMA CHT (called Laura) Acrobatic Driver - 32,5 kKm/243 KKm
"A Croma is Forever"

tzlaine
Programming Lead Emeritus
Posts: 1092
Joined: Thu Jun 26, 2003 1:33 pm

#41 Post by tzlaine »

noelte wrote:Hmm, i hadn't much to do with those strings yet, so i'm not sure, if i got everything right, but zach will contradict me if so ;-)

OTHER_REQUEST_TEXT
We like the %1%. The leader %2% is nice. Please give %4% the technology %3%.

1 - every translation of OTHER_REQUEST_TEXT have to match exactly the number of parameters (here 4) (less than 4 parameters won't hurt, but more would)
2 - you can reorder of the position of any parameter, but for instance %2% will always be a placeholder for leader-name.
3 - you can't add any new parameter (how should the compiler know them)


I would like to change it to match our requirements. For instance in a way that you don't have to use all parameters. For instance

OTHER_REQUEST_TEXT
Please give %4% the technology %3%.

any thoughts?
The %X% fields are the things that the program is trying to display to the user; the other text is just dressing. So it seems really unlikely that we'll want to vary the number of parameters. In other words, if we want the user to know that planet "Foo" has been destroyed by empire "Bar", we need BOTH those items communicated to the user, regardless of language. Also, the library we're using to accomplish the formatting barfs when you give it a string with a different number of fields than the parameters you give it.

@ Geoff:
Wow. We're simply not in the language business. Your suggestion is too much for me to want to deal with.

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#42 Post by Geoff the Medio »

tzlaine wrote:...if we want the user to know that planet "Foo" has been destroyed by empire "Bar", we need BOTH those items communicated to the user, regardless of language.
The idea was to add additional parameters in order to pass the correct pronoun or article or conjugation, not to remove basic info that's necessary regarless of language.
@ Geoff:
Wow. We're simply not in the language business. Your suggestion is too much for me to want to deal with.
Given that commercial games (eg. SMAC) don't do this or anything equivalent, I figured it would be a post v1.0 issue, if ever. The game would still be playable in Hungarian, or any language, even if not correct gramattically. It is unfortunate, as we want to be as inclusive as possible...

Perhaps someone else will tackle it? (I wouldn't want to delay the main game code by having you work on translation issues anyway...)

noelte
Juggernaut
Posts: 872
Joined: Fri Dec 26, 2003 12:42 pm
Location: Germany, Berlin

#43 Post by noelte »

Geoff the Medio wrote: ..., I figured it would be a post v1.0 issue, if ever. The game would still be playable in Hungarian, or any language, even if not correct gramattically. It is unfortunate, as we want to be as inclusive as possible...
If we need something like geoff suggested, we have to do it now. There will way to much places where strings will be constructed. (to be introduced after v1.0)

Maybe someone come up with some examples the usefullness of geoffs proposal. I will play fo using english, that's why it doesn't bother me. 8)
Press any key to continue or any other key to cancel.
Can COWs fly?

User avatar
Geoff the Medio
Programming, Design, Admin
Posts: 13587
Joined: Wed Oct 08, 2003 1:33 am
Location: Munich

#44 Post by Geoff the Medio »

noelte wrote:If we need something like geoff suggested, we have to do it now. There will way to much places where strings will be constructed. (to be introduced after v1.0)
Good point... rewriting all the stringtables and code would be a huge pain... and then they'd have to be all retranslated as well... blah.

Edit: Though it can probably wait until v0.5 when diplomacy gets added... There's mostly just complete isolated sentences right now... there's little need for complicated self-modifying strings if you know the full context beforehand... (However diplomacy will probably have lots of unknowable contexts)
Maybe someone come up with some examples the usefullness of geoffs proposal.
For translations, I don't speak or write any other languages fluently, so perhaps Blade Runner can give some specific relevant examples of situations like those he had problems with while attempting to translate SMAC (like the "gender part" and "object mode")?

I did find this page though:
http://www.perldoc.com/perl5.8.0/lib/Lo ... pen-To-You

Which illustrates some of the ways that other languages might need complicated logic to correctly form sentences that are simple in english. How often these would be relevant to FO stringtable entries, I'm not sure... But something along the lines of

"We offer to trade 24 units of Energium Crystals and 1 galactic credit per year for a lump sum payment of 244 galactic credits. Will you accept?"

would need to be represented with a whole slew of parameters that would alter various other parts of the string by conjugation or modification due to number or gender. Unless the whole message is built up from smaller strings by the code (which probably wouldn't work either, given grammatical differences), we'll need some way to change things...

For english-language usage, the simplest example is plurality. If offering "1 galactic credit", you don't want an "s" on the end of "credit", but if offereing 2 or more, you want the "s" (and there are more cases in Arabic, according to that article). We could either write a separate string for each case (eg. hard-code for english grammar) or have a test in the string based on the number to determine whether to add the "s", which could be omitted or expanded for other languaged without changing the game code.

Edit: Realized a suggestion for hard-coded markup was english-specific, so removed...
Last edited by Geoff the Medio on Fri Jan 14, 2005 7:23 am, edited 1 time in total.

Bastian-Bux
Creative Contributor
Posts: 215
Joined: Fri Jun 27, 2003 6:32 am
Location: Kassel / Germany

#45 Post by Bastian-Bux »

Well, german would be one of the easier languages for grammar purposes, as the rigid english SPO order works in german as well. Sounds a bit boring and low level, but well.
Wenn du die Macht hättest die Geschichte zu ändern, wo würdest du anfangen. Und viel wichtiger, wo aufhören?

If you had the power to change history, where would you start? And more importantly, where would you stop?

Post Reply