My First Post      My Facebook Profile      My MeOnShow Profile      W3LC Facebook Page      Learners Consortium Group      Job Portal      Shopping @Yeyhi.com

Pages










Friday, July 10, 2020

CSV delimiter linux Vs Mac : Resolving the issue of comma separated file read and write

Excel allows CSV that is Comma Separated Values in its filesystem. It uses comma as a delimiter.
And, this format is often used for exchanging data between programs. But there comes issues while working on different operating systems.


As per my research, there could be three versions of CSV formats included with Excel:


  • CSV (Comma delimited) (*.csv)
  • CSV (Macintosh) (*.csv)
  • CSV (MS-DOS) (*.csv)
The main difference arises because on a Macintosh each record (each line in the file) is terminated with a carriage return, as expected by the Mac. In the Windows, lines are terminated with a carriage return and line feed combination (CRLF). This can mess things up on the Macintosh.


Example suppose you are using Python for CSV Reader or writer mechanism then you should take care of two points:

1. use wb mode in file processing.

https://docs.python.org/2/library/csv.html#csv.writer


If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference.

That is,  where you are writing file, open in binary format - wb mode
And, when reading - use rb mode

2. Use proper line separator. Better to use os.linesep



Even in viewing file also, you can do the difference if it is only a one time change:


Open your csv file with WPS Spreadsheets

Select the A column -> Data -> Text to Columns -> Next -> Select or define your delimiter -> Finish


You can also view a related post:

Cheers ;)

By the way I quit smoking. Did you know?

Line ending difference - Windows vs Linux Mac : Line Breaks \n CRLF explained and resolutions in Python, Java, Javascript, C++, Node.JS etc


The Windows environment and the UNIX environment use different end of line characters.  So, if you are sharing files between the environments, one of the environments sees end of line characters at the end of each line of text that it does not normally expect, while the other may give error.

In general, z/OS UNIX text files contain a newline character at the end of each line. In ASCII, newline is X'0A'. In EBCDIC, newline is X'15'. (For example, ASCII code page ISO8859-1 and EBCDIC code page IBM-1047 translate back and forth between these characters.) Windows programs normally use a carriage return followed by a line feed character at the end of each line of a text file. In ASCII, carriage return/line feed is X'0D'/X'0A'. In EBCDIC, carriage return/line feed is X'0D'/X'15'.



As per my research, when you give a Line break there are different behaviours or interpretation based on the platform you are using:


  • Windows: '\r\n'
  • Mac (OS 9-): '\r'
  • Mac (OS 10+): '\n'
  • Unix/Linux: '\n'


Resolution:

Identify in the program which operating system you are using. From there find the Line separator value. Following are the methods you can use for some different programming languages you would use:


Python:

import os
os.linesep

Java:

System.lineSeparator()

NodeJS:

require('os').EOL

C++:

Many methods including Boost Library
One method QDir::separator().toAscii()

C#

string eol = Environment.NewLine;


Cheers :)