Ask Your Question
0

convert utf-8 topic messages to ISO_8859-1

asked 2019-11-12 15:43:12 -0600

inaciose gravatar image

Hi!

Hi have a problem converting from std_msg::string message with utf-8 encoded chars. If the string is hardcoded it works good, using this sample block of code inside the ros node:

    iconv_t cd = iconv_open("UTF-8", "ISO_8859-1");
if (cd == (iconv_t) -1) {
    perror("iconv_open failed!");
    //return 1;
}

char charstr[] = "ent\xE3o conta-me l\xE1 como \xE9 que te chamas";
char *in_buf = &charstr[0];
size_t in_left = sizeof(charstr) - 1;

printf("hardcode string: %d\n", strlen(charstr));
for(int i = 0; i < strlen(charstr); i++) {
    printf("(%d %c) ", charstr[i], charstr[i]);
}
printf("\n");

char output[255];
char *out_buf = &output[0];
size_t out_left = sizeof(output) - 1;

do {
    if (iconv(cd, &in_buf, &in_left, &out_buf, &out_left) == (size_t) -1) {
        perror("iconv failed!");
        //return 1;
    }
} while (in_left > 0 && out_left > 0);
*out_buf = 0;

printf("%s -> %s\n", charstr, output);

it prints (and its ok):

ent�o conta-me l� como � que te chamas -> então conta-me lá como é que te chamas

If i try to feed the conversion with the same string from a subscribed topic message (I manually publish exactly the same string), with the same code but only adapted to use the topic message, as bellow.

std::copy( data.begin(), data.end(), charstr );
charstr[data.length()] = 0;

in_buf = &charstr[0];
in_left = sizeof(charstr) - 1;

out_buf = &output[0];
out_left = sizeof(output) - 1;

do {
    if (iconv(cd, &in_buf, &in_left, &out_buf, &out_left) == (size_t) -1) {
        perror("iconv failed!");
        //return 1;
    }
} while (in_left > 0 && out_left > 0);
*out_buf = 0;

printf("%s -> %s\n", charstr, output);

its print (wrong because not converted): ent\xE3o conta-me l\xE1 como \xE9 que te chamas -> ent\xE3o conta-me l\xE1 como \xE9 que

So... the code is the same, the input supposedly the same.

How do you handle the utf-8 strings in message topics on c++ nodes?

A code sample is gold!

tks

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2019-11-13 19:10:58 -0600

inaciose gravatar image

After spent some more time around this I found that my test procedure was wrong.

Long story. I print all the bytes in the hard-coded string and in the message topic and found that the /xE3 for instance have a different representation. Then i start to code to unescape the topic message and convert it to the same number of bytes and values the hard-coded representation have. In the middle of the changes needed to do it, suddenly, I have seen the light!! :)

So... just build a launch file and remap the output topic to the same of the input topic. So i didn't need to copy past the message from one topic and publish in the other, as i did in all the former tests. The light was true. It work well. The real problem was my rush to test it, so i took a quick rood (copy/paste) and get lost.

My problem is solve, and to be true it was never a real problem. Only a mistake. But if happens to some one. Be advised. Test the nodes with utf8 messages with no english text by publishing directly, dont copy / paste between topics.

edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools

1 follower

Stats

Asked: 2019-11-12 15:43:12 -0600

Seen: 43 times

Last updated: Nov 12