Creating a certificate with Unicode characters in Issuer and Subject

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a certificate with Unicode characters in Issuer and Subject

Shaw Graham George
Hi,

I have a requirement to make some test keys/certificates that contain
Unicode (Chinese) data in the Issuer and Subject fields.  Print-out from
an example certificate using "openssl x509" is:

        Issuer: C=\x00C\x00N,
ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
O=\x00G\x00D\x00C\x00A\x00
\x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
\x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
        Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
...

Is this at all possible using the openssl tool?  From the manual pages
it seems that UTF-8 is supported, but not Unicode - for example the
config man page says that null characters in strings is not allowed.

If not, then does anybody know of any other tools that I could use to
make my test keys/certificates.

Thanks in advance,

George.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Creating a certificate with Unicode characters in Issuer and Subject

Dr. Stephen Henson
On Thu, Nov 19, 2009, Shaw Graham George wrote:

> Hi,
>
> I have a requirement to make some test keys/certificates that contain
> Unicode (Chinese) data in the Issuer and Subject fields.  Print-out from
> an example certificate using "openssl x509" is:
>
>         Issuer: C=\x00C\x00N,
> ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
> L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
> O=\x00G\x00D\x00C\x00A\x00
> \x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
> \x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
>         Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
> ...
>
> Is this at all possible using the openssl tool?  From the manual pages
> it seems that UTF-8 is supported, but not Unicode - for example the
> config man page says that null characters in strings is not allowed.
>
> If not, then does anybody know of any other tools that I could use to
> make my test keys/certificates.
>

Characters are passed to OpenSSL using UTF8, then depending on the
configuration options it gets translated into either a BMPString or a
UTF8String. From an application point of view it shouldn't matter which
(RFC3280 and later mandate UTF8Strings).

OpenSSL will *NOT* however do what happens above with the C (Country) field.
That is a two character code and only PrintableString (a restricted version of
ASCII) characters are permitted. Doing anything else violates several
standards.

BTW if you pick appropriate values for the -nameopt option and if your
terminal supports it you should be able to get that certificate to display
correctly.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Creating a certificate with Unicode characters in Issuer and Subject

mclellan_dave
In reply to this post by Shaw Graham George
UTF-8 *IS* perfectly valid Unicode -- it's one of the main Unicode
encodings, and seems entirely appropriate for use in certs, although I
personally have no knowledge of the support in OpenSSL or the X509
standard.  UTF-8 is a variable length encoding where the valid UTF-8
characters are from 1 to 6 bytes in length.  

UTF-8 encodes the first 128 ASCII characters identically to 7-bit ASCII,
and UTF-8 strings preserve the notion of a null-terminated character
string, such that the zero byte terminates a UTF-8 string compatibly
with ASCII null-terminated strings.

So the warning that a null character is not allowed in a string really
means  it can't be embedded in the 'middle' of a string, since the null
will be interpreted to *terminate* the string.    

This is NOT the case with UTF-16.  individual bytes in UTF-16 encoding
may certainly be zero, and they do NOT terminate a string.   So it makes
sense that UTF-16 would not be supported in the Issuer and Subject
fields.    But UTF-8 seems like an excellent fit to me.

The trick is getting the native characters from the user converted to
UTF-8 for storage in the certificate.   Presumably the user enters the
Issuer and Subject data in a GUI or at a command line in a shell that is
using Big5 or GB-18030 character encoding. The application must convert
the entered data into UTF-8 to pass to the cert creation process.
There's a million ways to do that conversion (an excellent best tool is
ICU).

Fascinating.   Good luck with it.   I'd like to hear what your progress
is

+-+-+-+-+-+-+
Dave McLellan, Symmetrix Software
EMC Corporation, 228 South St, Hopkinton MA
Mail Stop LL/AA-24
office 508-249-1257, fax 508-544-2129
cell 978-500-2546, IM: [hidden email]
+-+-+-+-+-+-+

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Shaw Graham George
Sent: Thursday, November 19, 2009 8:08 AM
To: [hidden email]
Subject: Creating a certificate with Unicode characters in Issuer and
Subject

Hi,

I have a requirement to make some test keys/certificates that contain
Unicode (Chinese) data in the Issuer and Subject fields.  Print-out from
an example certificate using "openssl x509" is:

        Issuer: C=\x00C\x00N,
ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
O=\x00G\x00D\x00C\x00A\x00
\x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
\x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
        Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
...

Is this at all possible using the openssl tool?  From the manual pages
it seems that UTF-8 is supported, but not Unicode - for example the
config man page says that null characters in strings is not allowed.

If not, then does anybody know of any other tools that I could use to
make my test keys/certificates.

Thanks in advance,

George.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Creating a certificate with Unicode characters in Issuer andSubject

Shaw Graham George
In reply to this post by Dr. Stephen Henson
Thanks Steve,

>> OpenSSL will *NOT* however do what happens above with the C (Country)
field.
>> That is a two character code and only PrintableString (a restricted
version of
>> ASCII) characters are permitted. Doing anything else violates several
standards.

That's interesting, considering that this example certificate was sent
to us by one of our customers, and appears to be issued by the Guandong
Certificate Authority (GDCA), which is presumably a live CA ...

Is that possible - that a real CA can violate the standards like this?
Or is this just like Microsoft breaking standards - you just have to
live with it?

BTW, the "rogue" example certificate seems OK when used as an input to
other openssl functions ... E.g. openssl smime.

But putting the country name to one side, what about the other data
elements?  I understand the UTF-8 input is possible in openssl.  Is what
you're saying that it's only UTF-8 that is possible, so if I want
Unicode input, then I have to find another solution.

G.


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Dr. Stephen Henson
Sent: 19 November 2009 13:24
To: [hidden email]
Subject: Re: Creating a certificate with Unicode characters in Issuer
andSubject

On Thu, Nov 19, 2009, Shaw Graham George wrote:

> Hi,
>
> I have a requirement to make some test keys/certificates that contain
> Unicode (Chinese) data in the Issuer and Subject fields.  Print-out
> from an example certificate using "openssl x509" is:
>
>         Issuer: C=\x00C\x00N,
> ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
> L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
> O=\x00G\x00D\x00C\x00A\x00
> \x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
> \x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
>         Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
> ...
>
> Is this at all possible using the openssl tool?  From the manual pages

> it seems that UTF-8 is supported, but not Unicode - for example the
> config man page says that null characters in strings is not allowed.
>
> If not, then does anybody know of any other tools that I could use to
> make my test keys/certificates.
>

Characters are passed to OpenSSL using UTF8, then depending on the
configuration options it gets translated into either a BMPString or a
UTF8String. From an application point of view it shouldn't matter which
(RFC3280 and later mandate UTF8Strings).

OpenSSL will *NOT* however do what happens above with the C (Country)
field.
That is a two character code and only PrintableString (a restricted
version of
ASCII) characters are permitted. Doing anything else violates several
standards.

BTW if you pick appropriate values for the -nameopt option and if your
terminal supports it you should be able to get that certificate to
display correctly.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Creating a certificate with Unicode characters in Issuer andSubject

Dr. Stephen Henson
On Thu, Nov 19, 2009, Shaw Graham George wrote:

> Thanks Steve,
>
> >> OpenSSL will *NOT* however do what happens above with the C (Country)
> field.
> >> That is a two character code and only PrintableString (a restricted
> version of
> >> ASCII) characters are permitted. Doing anything else violates several
> standards.
>
> That's interesting, considering that this example certificate was sent
> to us by one of our customers, and appears to be issued by the Guandong
> Certificate Authority (GDCA), which is presumably a live CA ...
>
> Is that possible - that a real CA can violate the standards like this?
> Or is this just like Microsoft breaking standards - you just have to
> live with it?
>

There are many implementations that violate standards all over the place. The
trick sometimes is to try to live with them without doing so insecurely.

Could you send me a sample certificate like that btw? I'll check it out. It
might be doing something weirder like putting Unicode into a PrintableString.

> BTW, the "rogue" example certificate seems OK when used as an input to
> other openssl functions ... E.g. openssl smime.
>
> But putting the country name to one side, what about the other data
> elements?  I understand the UTF-8 input is possible in openssl.  Is what
> you're saying that it's only UTF-8 that is possible, so if I want
> Unicode input, then I have to find another solution.
>

What I'm saying is that you input characters using UTF8 or can do so in a
config file. Terminals often have a UTF8 mode which does this automatically.

One the data is in OpenSSL it can decide to translate them into BMPStrings
(Unicode near enough) internally.

That btw is just what the command line utilities do. The APIs are rather more
flexible.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Creating a certificate with Unicode characters in Issuer and Subject

Brant Thomsen
In reply to this post by mclellan_dave
One major issue to consider when using UTF-16 encoding is that the string can be big-endian or little-endian.  If you were to somehow generate a certificate using UTF-16 encoded strings, you would need to make sure that those certificates will only be used on machines that have the same architecture as the machine generating the certificate.  Otherwise, the strings will be unreadable.

I would highly recommend just converting your UTF-16 strings into UTF-8 and using that in your certificate(s).  It will save you a lot of headaches.

Brant Thomsen

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of [hidden email]
Sent: Thursday, November 19, 2009 6:34 AM
To: [hidden email]
Subject: RE: Creating a certificate with Unicode characters in Issuer and Subject

UTF-8 *IS* perfectly valid Unicode -- it's one of the main Unicode
encodings, and seems entirely appropriate for use in certs, although I
personally have no knowledge of the support in OpenSSL or the X509
standard.  UTF-8 is a variable length encoding where the valid UTF-8
characters are from 1 to 6 bytes in length.

UTF-8 encodes the first 128 ASCII characters identically to 7-bit ASCII,
and UTF-8 strings preserve the notion of a null-terminated character
string, such that the zero byte terminates a UTF-8 string compatibly
with ASCII null-terminated strings.

So the warning that a null character is not allowed in a string really
means  it can't be embedded in the 'middle' of a string, since the null
will be interpreted to *terminate* the string.

This is NOT the case with UTF-16.  individual bytes in UTF-16 encoding
may certainly be zero, and they do NOT terminate a string.   So it makes
sense that UTF-16 would not be supported in the Issuer and Subject
fields.    But UTF-8 seems like an excellent fit to me.

The trick is getting the native characters from the user converted to
UTF-8 for storage in the certificate.   Presumably the user enters the
Issuer and Subject data in a GUI or at a command line in a shell that is
using Big5 or GB-18030 character encoding. The application must convert
the entered data into UTF-8 to pass to the cert creation process.
There's a million ways to do that conversion (an excellent best tool is
ICU).

Fascinating.   Good luck with it.   I'd like to hear what your progress
is

+-+-+-+-+-+-+
Dave McLellan, Symmetrix Software
EMC Corporation, 228 South St, Hopkinton MA
Mail Stop LL/AA-24
office 508-249-1257, fax 508-544-2129
cell 978-500-2546, IM: [hidden email]
+-+-+-+-+-+-+

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Shaw Graham George
Sent: Thursday, November 19, 2009 8:08 AM
To: [hidden email]
Subject: Creating a certificate with Unicode characters in Issuer and
Subject

Hi,

I have a requirement to make some test keys/certificates that contain
Unicode (Chinese) data in the Issuer and Subject fields.  Print-out from
an example certificate using "openssl x509" is:

        Issuer: C=\x00C\x00N,
ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
O=\x00G\x00D\x00C\x00A\x00
\x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
\x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
        Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
...

Is this at all possible using the openssl tool?  From the manual pages
it seems that UTF-8 is supported, but not Unicode - for example the
config man page says that null characters in strings is not allowed.

If not, then does anybody know of any other tools that I could use to
make my test keys/certificates.

Thanks in advance,

George.
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Creating a certificate with Unicode characters in Issuer and Subject

Dr. Stephen Henson
On Thu, Nov 19, 2009, Brant Thomsen wrote:

> One major issue to consider when using UTF-16 encoding is that the string
> can be big-endian or little-endian.  If you were to somehow generate a
> certificate using UTF-16 encoded strings, you would need to make sure that
> those certificates will only be used on machines that have the same
> architecture as the machine generating the certificate.  Otherwise, the
> strings will be unreadable.
>
> I would highly recommend just converting your UTF-16 strings into UTF-8 and
> using that in your certificate(s).  It will save you a lot of headaches.
>

The encoding rules dictate that BMPStrings have to be big endian format.

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: Creating a certificate with Unicode characters in Issuer and Subject

Shaw Graham George
In reply to this post by Shaw Graham George

No, this is the output from "openssl x509 -text", but without "-nameopt utf8", which has no effect on the output anyway.

G.


-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: 19 November 2009 17:16
To: Shaw Graham George
Subject: Re: Creating a certificate with Unicode characters in Issuer and Subject

Scríobh Shaw Graham George:

> Hi,
>
> I have a requirement to make some test keys/certificates that contain
> Unicode (Chinese) data in the Issuer and Subject fields.  Print-out
> from an example certificate using "openssl x509" is:
>
>         Issuer: C=\x00C\x00N,
> ST=\x00G\x00u\x00a\x00n\x00g\x00d\x00o\x00n\x00g,
> L=\x00G\x00u\x00a\x00n\x00g\x00z\x00h\x00o\x00u,
> O=\x00G\x00D\x00C\x00A\x00
> \x00C\x00e\x00r\x00t\x00i\x00f\x00i\x00c\x00a\x00t\x00e\x00
> \x00A\x00u\x00t\x00h\x00o\x00r\x00i\x00t\x00y
>         Subject: C=\x00C\x00N, ST=^\x7FN\x1Cw\x01, L=^\x7F]\xDE^\x02,
> ...

UTF-8 is a means for providing Unicode glyph sequences on computers.
Each Unicode character has 1 reasonable UTF-8 transform.  As per my personal experience, OpenSSL does handle them.

What you have in hand looks more like what happened when a certificate tool converted the output into what appears to be UTF-16 big endian, then emitted that to your terminal.  Very odd.

As it turns out, it looks like the CA you picked did the right thing as 0x00430x00004E is "CN".  It's mainly your output program that has made ... unusual choices when asked to emit the subject and issuer to your screen; I'm assuming it wasn't OpenSSL.

Anyway, yes, with the proper options on input, OpenSSL will accept a
UTF-8 stream as elements in the subject and isuser DNs.  I believe that OpenSSL already presumes incoming text is in UTF-8, and a "-nameopt utf8" all you need to emit UTF-8 directly to the terminal.

  Yours, &c
  Lance Dryden
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Creating a certificate with Unicode characters in Issuer and Subject

Dr. Stephen Henson
On Thu, Nov 19, 2009, Shaw Graham George wrote:

>
> No, this is the output from "openssl x509 -text", but without "-nameopt utf8", which has no effect on the output anyway.
>

Try -nameopt oneline,utf8,-esc_msb

Also: -nameopt multiline,utf8-esc_msn,show_type which will show how the actual
string types (and might show what that strange C encoding is).

Steve.
--
Dr Stephen N. Henson. OpenSSL project core developer.
Commercial tech support now available see: http://www.openssl.org
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]