0 users browsing Hacking. | 2 bots  
    Main » Hacking » Why is ROM translation so (technologically) hard?
    Pages: First Previous 1 2
    Posted on 19-06-20, 22:46 (revision 1)

    Post: #136 of 210
    Since: 10-29-18

    Last post: 1889 days
    Last view: 1861 days
    Posted by sureanem
    You can put more than two characters in a kanji.

    You're clearly trying to compare radicals to kanji compounds, which is just hilarious scrambling considering how radicals work. This isn't Hangul, buddy.
    $gf tells me you have too much time on your hands.
    Posted on 19-06-20, 23:12
    Stirrer of Shit
    Post: #424 of 717
    Since: 01-26-19

    Last post: 1776 days
    Last view: 1774 days
    I'm not talking about any radicals or compounds, man. That sounds like something you'd want to advertise your sports drink has got lots of (or none at all, I wouldn't know - free radicals cause cancer, right?).

    You can fit more than two (2) alphabetic characters in the area of the sprite that would ordinarily be used to render one (1) discrete kanji logogram.

    Provided a kanji logogram is encoded with two bytes and an alphabetic character ordinarily would be encoded with one, this reduces the amount of space needed to encode a given sequence of alphabetic characters.

    There was a certain photograph about which you had a hallucination. You believed that you had actually held it in your hands. It was a photograph something like this.
    Posted on 19-06-21, 00:23
    Stirrer of Shit
    Post: #425 of 717
    Since: 01-26-19

    Last post: 1776 days
    Last view: 1774 days
    Posted by Kawa
    Unfortunately, the amount of 90s/00s games on consoles and handhelds that use Unicode in any form since Unicode's inception (1991) can be counted on one hand, and UCS-2 (as "classic UTF-16 aka "Unicode"" is properly called) is considered wasteful.

    Fun fact about dedicating tile space to digraphs: at least one Final Fantasy fan translation that I've seen did this, with about ten at most character values being digraphs like 'll', 'il', or 'th'. I'll bet biscuits to an asskicking that this was done primarily for the visual aspect, and that the actual text was still repointed to fit.

    Well, yeah, then, uh, case closed. Personally, I think UCS-2 should be called either Unicode or wchar_t, but "classic UTF-16" seems like a reasonable compromise to minimize confusion.

    So why is repointing so important, then? It's a slight improvement, but you sure could do without it if the ratios are as you say. How come it can preclude games from getting translated?


    There was a certain photograph about which you had a hallucination. You believed that you had actually held it in your hands. It was a photograph something like this.
    Posted on 19-06-21, 01:13
    Custom title here

    Post: #526 of 1164
    Since: 10-30-18

    Last post: 76 days
    Last view: 4 days
    Because translation isn't an exact science. Some statements will be shorter, others longer. Some will have to be reworked to fit the game's output, which can change the length.
    Some games just have insane lovecraftian nightmares where one would expect the text engine code to be.


    I don't know what specific games you're thinking of, so I can only speak in vague generalities.

    --- In UTF-16, where available. ---
    Posted on 19-06-21, 06:24
    20% cooler

    Post: #279 of 599
    Since: 10-29-18

    Last post: 208 days
    Last view: 55 sec.
    User is online
    Chrono Trigger's text encoding included a large swathe of dictionary lookup bytes, mapping one byte value to two or more characters, along with the general "insert name here" bytes. This would let entire parts of words like "pedia" be saved as one byte in the original text string, but decode into the full version for display in the dialogue box. The dictionary is not based on the top 30 of a given language, but tailored to the needs of the game.

    Unfortunately, I don't know what the Japanese version's text encoding is like, only that the names and dictionary lookups are there too, so I don't know how it handles kanji.

    Does CT have kanji?
    Posted on 19-06-21, 10:46

    Post: #137 of 210
    Since: 10-29-18

    Last post: 1889 days
    Last view: 1861 days
    Posted by sureanem
    I'm not talking about any radicals or compounds, man. That sounds like something you'd want to advertise your sports drink has got lots of (or none at all, I wouldn't know - free radicals cause cancer, right?).

    You can fit more than two (2) alphabetic characters in the area of the sprite that would ordinarily be used to render one (1) discrete kanji logogram.

    Provided a kanji logogram is encoded with two bytes and an alphabetic character ordinarily would be encoded with one, this reduces the amount of space needed to encode a given sequence of alphabetic characters.

    So you basically don't know what you're talking about. Gotcha.
    Posted on 19-06-21, 11:33
    Derpy is best pony

    Post: #281 of 599
    Since: 10-29-18

    Last post: 208 days
    Last view: 55 sec.
    User is online
    What's worse is, with all this talk of specific encoding schemes...

    repointing to make room for trivially-encoded strings is the easiest way, even on systems with "nasty" pointers.



    Seriously though, there's basically only one reason for Chrono Trigger to have dictionary lookup bytes and that's "ROM is expensive".
    Posted on 19-06-21, 18:45

    Post: #69 of 100
    Since: 10-30-18

    Last post: 1795 days
    Last view: 1360 days
    love 2 unironically write the string of text "discrete kanji logogram"
    Posted on 19-06-21, 22:07

    Post: #138 of 210
    Since: 10-29-18

    Last post: 1889 days
    Last view: 1861 days
    Yeah, why can't we just say singular adopted logographic Chinese character?
    Posted on 19-06-22, 08:29

    Post: #70 of 100
    Since: 10-30-18

    Last post: 1795 days
    Last view: 1360 days
    formally numberless conceptual-morphemic orthographical unit of the middle empire
    Posted on 19-06-22, 12:02
    H3H3H3H

    Post: #282 of 599
    Since: 10-29-18

    Last post: 208 days
    Last view: 55 sec.
    User is online
    Yup, this is a byuuboard alright.
    Posted on 19-07-21, 02:31

    Post: #45 of 49
    Since: 10-29-18

    Last post: 1914 days
    Last view: 1799 days
    Can confirm firsthand what Kawa wrote about Chrono Trigger text stored in a tailored dictionary. One of the things I wrote in my nascent (and recently untouched) CT ROM parser is the ability to read the text data stored, and read early on in Geiger's notes about the compression/decompression method and it checked out.
    Posted on 19-07-21, 09:43
    A man of wealth and taste

    Post: #304 of 599
    Since: 10-29-18

    Last post: 208 days
    Last view: 55 sec.
    User is online
    I'd damn well hope so considering the amount of research time I put in that post.
    Pages: First Previous 1 2
      Main » Hacking » Why is ROM translation so (technologically) hard?
      Yes, it's an ad.