OpenSSL Selection of Text Encoding for the -out and -text Options

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

OpenSSL Selection of Text Encoding for the -out and -text Options

OpenSSL - User mailing list
I'm working on an ACME client written in Python3. I expect the certificate sent by the ACME server will be in utf-8 per RFC 8555, sec. 5. It seems from Python Standard Library function sys.getfilesystemencoding() that a filesystem has a particular encoding for filesystem names (which is not an explicit default for text files). I wonder if OpenSSL (and generally other software) automatically uses the filesystem name encoding by default for all text output. I don't see anything about text encoding on the "Compilation and Installation" wiki page. I have OpenSSL from a Debian package. I don't see anything about text encoding in the configuration file /etc/ssl/openssl.cnf.

What is/are and how does OpenSSL choose the text encodings for -out and -text, respectively. Information about line encoding selection would be a nice bonus. I would guess that line encoding is determined by the OS target and is essentially hardcoded to the package or source code distribution. I would like to have all my related domain certification files in the same text encoding and to decode the -text output into a string value as reliably (and as transparently to the user) as possible. My fallback position is of course to just hardcode utf-8. I would like to avoid that unless it's the smart thing to do. (I don't follow Windows but know is used to favor utf-16.) Thanks.

Douglas Morris
Reply | Threaded
Open this post in threaded view
|

Re: OpenSSL Selection of Text Encoding for the -out and -text Options

Viktor Dukhovni
On Sun, Jan 19, 2020 at 02:51:53AM +0000, Douglas Morris via openssl-users wrote:

> I'm working on an ACME client written in Python3. I expect the
> certificate sent by the ACME server will be in utf-8 per RFC 8555,
> sec. 5.

Certificates are in DER or PEM form, not utf-8.  Some strings in the
certificate might be UTF-8, but that does not look relevant here.

> It seems from Python Standard Library function
> sys.getfilesystemencoding() that a filesystem has a particular
> encoding for filesystem names (which is not an explicit default for
> text files).

File system metadata (file names, ...) is distinct from file content.

> I wonder if OpenSSL (and generally other software) automatically uses
> the filesystem name encoding by default for all text output.

This makes no sense.  OpenSSL does not display filenames, it reads
data from files given to it via API calls and command-line options.

> I don't see anything about text encoding on the "Compilation and
> Installation" wiki page. I have OpenSSL from a Debian package. I don't
> see anything about text encoding in the configuration file
> /etc/ssl/openssl.cnf.

The issue does not come up.  OpenSSL functions that take filename
arguments use the the verbatim C-character arrays passed to them in API
calls.  The names are byte arrays not strings subject to encoding and
decoding.

> What is/are and how does OpenSSL choose the text encodings for -out
> and -text, respectively.

No encoding at all.

> Information about line encoding selection would be a nice bonus.

DER files are binary, and PEM files are text files.  The platform's C
library normally determines how line-oriented data is written to files.

OpenSSL's BIO abstraction over files generally uses STDIO to perform
the underlying I/O.  So line endings are a feature of the C-library,
not OpenSSL.

> I would like to have all my related domain certification files in the
> same text encoding and to decode the -text output into a string value
> as reliably (and as transparently to the user) as possible. My
> fallback position is of course to just hardcode utf-8.

Here, you seem to be confusing file name encodings with file content.
PEM files are base64-encoded ASCII.  As for the output of "x509 -text",
there are various options to control the output format.

At this time, you really should be using UTF-8 unconditionally.

--
    Viktor.