How to validate UTF-8 in a file

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to validate UTF-8 in a file

Silvia Gisela Pavon Velasco




Hello,

I would like some advice about how can I validate that a file is in utf-8
format. I have set the proper unix environment variables to work with the
utf-8 format; but however, I have the need to validate if a file is in that
format.

Regards,

Silvia Pavón
_________________________________________________________________________________
Prepárate para hablar y navegar sin límite... visita www.masternet.com.mx


NOTA: La información de este correo es de propiedad exclusiva y
confidencial. Este mensaje es sólo para el destinatario señalado, si usted
no lo es, destrúyalo de inmediato. Ninguna información aquí contenida debe
ser entendida como dada o avalada por Alestra, sus subsidiarias o sus
empleados, salvo cuando ello expresamente se indique. Es responsabilidad de
quien recibe este correo de asegurarse que esté libre de virus, por lo
tanto ni Alestra, sus subsidiarias ni sus empleados aceptan responsabilidad
alguna.
NOTE:  The information in this email is proprietary and confidential. This
message is for the designated recipient only, if you are not the intended
recipient, you should destroy it immediately. Any information in this
message shall not be understood as given or endorsed by Alestra, its
subsidiaries or their employees, unless expressly so stated. It is the
responsibility of the recipient to ensure that this email is virus free,
therefore neither Alestra, its subsidiaries nor their employees accept any
responsibility.

______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: How to validate UTF-8 in a file

Peter BENKO,VSE IT Sluzby,+421-55-610-2045,+421-903-855532
On Wed, Aug 17, 2005 at 10:01:26AM -0500, Silvia Gisela Pavon Velasco wrote:

>
>
>
>
> Hello,
>
> I would like some advice about how can I validate that a file is in utf-8
> format. I have set the proper unix environment variables to work with the
> utf-8 format; but however, I have the need to validate if a file is in that
> format.
Using the 'file' command under the linux you can realize if file is
UTF8.

Example:
file aaa.txt
aaa.txt: UTF-8 Unicode English text

>
> Regards,
>
> Silvia Pavón
> _________________________________________________________________________________
> Prepárate para hablar y navegar sin límite... visita www.masternet.com.mx
>
>
> NOTA: La información de este correo es de propiedad exclusiva y
> confidencial. Este mensaje es sólo para el destinatario se?alado, si usted
> no lo es, destrúyalo de inmediato. Ninguna información aquí contenida debe
> ser entendida como dada o avalada por Alestra, sus subsidiarias o sus
> empleados, salvo cuando ello expresamente se indique. Es responsabilidad de
> quien recibe este correo de asegurarse que esté libre de virus, por lo
> tanto ni Alestra, sus subsidiarias ni sus empleados aceptan responsabilidad
> alguna.
> NOTE:  The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Alestra, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Alestra, its subsidiaries nor their employees accept any
> responsibility.
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> User Support Mailing List                    [hidden email]
> Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: How to validate UTF-8 in a file

mclellan_dave
In reply to this post by Silvia Gisela Pavon Velasco
Probably the file command regonizes the UTF-8 Byte Order Mark as it does
other magic numbers.  

UTF-8 BOM is 0xEFBBBF, a signature that indicates the encoding of the file
is UTF-8.  If you have an application that is reading the file and needs to
know, read the first six bytes of the file and act accordingly

FWIW:  Here's a useful URL: http://www.unicode.org/faq/utf_bom.html#22

Dave McLellan --Consulting Software Engineer - SPEA Engineering
EMC Corporation
228 South St. Mail Stop: 228 LL/AA-24
Hopkinton, MA 01748  USA
+1-508-249-1257 F: +1-508-497-8030  [hidden email]


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Peter BENKO,VSE IT
Sluzby,+421-55-610-2045,+421-903-855532
Sent: Wednesday, August 17, 2005 1:11 PM
To: [hidden email]
Subject: Re: How to validate UTF-8 in a file

On Wed, Aug 17, 2005 at 10:01:26AM -0500, Silvia Gisela Pavon Velasco wrote:
>
>
>
>
> Hello,
>
> I would like some advice about how can I validate that a file is in utf-8
> format. I have set the proper unix environment variables to work with the
> utf-8 format; but however, I have the need to validate if a file is in
that
> format.
Using the 'file' command under the linux you can realize if file is
UTF8.

Example:
file aaa.txt
aaa.txt: UTF-8 Unicode English text

>
> Regards,
>
> Silvia Pavón
>
____________________________________________________________________________
_____
> Prepárate para hablar y navegar sin límite... visita www.masternet.com.mx
>
>
> NOTA: La información de este correo es de propiedad exclusiva y
> confidencial. Este mensaje es sólo para el destinatario se?alado, si usted
> no lo es, destrúyalo de inmediato. Ninguna información aquí contenida debe
> ser entendida como dada o avalada por Alestra, sus subsidiarias o sus
> empleados, salvo cuando ello expresamente se indique. Es responsabilidad
de
> quien recibe este correo de asegurarse que esté libre de virus, por lo
> tanto ni Alestra, sus subsidiarias ni sus empleados aceptan
responsabilidad

> alguna.
> NOTE:  The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Alestra, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Alestra, its subsidiaries nor their employees accept any
> responsibility.
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> User Support Mailing List                    [hidden email]
> Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: How to validate UTF-8 in a file

Silvia Gisela Pavon Velasco




Thanks for the advice.

Silvia Pavón


                                                                       
             "mclellan, dave"                                          
             <mclellan_dave@em                                        
             c.com>                                                     To
             Sent by:                  "'[hidden email]'"  
             owner-openssl-use         <[hidden email]>    
             [hidden email]                                             cc
                                                                       
                                                                   Subject
             17/08/2005 12:44          RE: How to validate UTF-8 in a file
             p.m.                                                      
                                                                       
                                                                       
             Please respond to                                        
             openssl-users@ope                                        
                 nssl.org                                              
                                                                       
                                                                       




Probably the file command regonizes the UTF-8 Byte Order Mark as it does
other magic numbers.

UTF-8 BOM is 0xEFBBBF, a signature that indicates the encoding of the file
is UTF-8.  If you have an application that is reading the file and needs to
know, read the first six bytes of the file and act accordingly

FWIW:  Here's a useful URL: http://www.unicode.org/faq/utf_bom.html#22

Dave McLellan --Consulting Software Engineer - SPEA Engineering
EMC Corporation
228 South St. Mail Stop: 228 LL/AA-24
Hopkinton, MA 01748  USA
+1-508-249-1257 F: +1-508-497-8030  [hidden email]


-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Peter BENKO,VSE IT
Sluzby,+421-55-610-2045,+421-903-855532
Sent: Wednesday, August 17, 2005 1:11 PM
To: [hidden email]
Subject: Re: How to validate UTF-8 in a file

On Wed, Aug 17, 2005 at 10:01:26AM -0500, Silvia Gisela Pavon Velasco
wrote:
>
>
>
>
> Hello,
>
> I would like some advice about how can I validate that a file is in utf-8
> format. I have set the proper unix environment variables to work with the
> utf-8 format; but however, I have the need to validate if a file is in
that
> format.
Using the 'file' command under the linux you can realize if file is
UTF8.

Example:
file aaa.txt
aaa.txt: UTF-8 Unicode English text

>
> Regards,
>
> Silvia Pavón
>
____________________________________________________________________________

_____
> Prepárate para hablar y navegar sin límite... visita www.masternet.com.mx
>
>
> NOTA: La información de este correo es de propiedad exclusiva y
> confidencial. Este mensaje es sólo para el destinatario se?alado, si
usted
> no lo es, destrúyalo de inmediato. Ninguna información aquí contenida
debe
> ser entendida como dada o avalada por Alestra, sus subsidiarias o sus
> empleados, salvo cuando ello expresamente se indique. Es responsabilidad
de
> quien recibe este correo de asegurarse que esté libre de virus, por lo
> tanto ni Alestra, sus subsidiarias ni sus empleados aceptan
responsabilidad
> alguna.
> NOTE:  The information in this email is proprietary and confidential.
This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Alestra, its
> subsidiaries or their employees, unless expressly so stated. It is the
> responsibility of the recipient to ensure that this email is virus free,
> therefore neither Alestra, its subsidiaries nor their employees accept
any
> responsibility.
>
> ______________________________________________________________________
> OpenSSL Project                                 http://www.openssl.org
> User Support Mailing List                    [hidden email]
> Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]
______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]


______________________________________________________________________
OpenSSL Project                                 http://www.openssl.org
User Support Mailing List                    [hidden email]
Automated List Manager                           [hidden email]