Read Csv File in Python From Desktop
CSV (comma-separated value) files are a mutual file format for transferring and storing data. The ability to read, dispense, and write data to and from CSV files using Python is a key skill to master for whatsoever information scientist or business analysis. In this post, nosotros'll go over what CSV files are, how to read CSV files into Pandas DataFrames, and how to write DataFrames back to CSV files post assay.
Pandas is the most popular data manipulation package in Python, and DataFrames are the Pandas information type for storing tabular 2nd information.
- Load CSV files to Python Pandas
- i. File Extensions and File Types
- 2. Data Representation in CSV files
- Other Delimiters / Separators – TSV files
- Delimiters in Text Fields – Quotechar
- three. Python – Paths, Folders, Files
- Finding your Python Path
- File Loading: Absolute and Relative Paths
- four. Pandas CSV File Loading Errors
- Advanced Read CSV Files
- Specifying Data Types
- Skipping and Picking Rows and Columns From File
- Custom Missing Value Symbols
- CSV Format Advantages and Disadvantages
- Boosted Reading
Load CSV files to Python Pandas
The bones procedure of loading data from a CSV file into a Pandas DataFrame (with all going well) is achieved using the "read_csv" function in Pandas:
# Load the Pandas libraries with allonym 'pd' import pandas every bit pd # Read data from file 'filename.csv' # (in the same directory that your python process is based) # Command delimiters, rows, column names with read_csv (see later) data = pd.read_csv("filename.csv") # Preview the get-go v lines of the loaded information data.head()
While this code seems uncomplicated, an understanding of three central concepts is required to fully grasp and debug the operation of the information loading procedure if you lot encounter issues:
- Understanding file extensions and file types – what do the letters CSV really mean? What's the divergence betwixt a .csv file and a .txt file?
- Understanding how data is represented within CSV files – if y'all open a CSV file, what does the data really look like?
- Understanding the Python path and how to reference a file – what is the accented and relative path to the file y'all are loading? What directory are yous working in?
- CSV information formats and errors – common errors with the function.
Each of these topics is discussed below, and we stop this tutorial past looking at some more advanced CSV loading mechanisms and giving some broad advantages and disadvantages of the CSV format.
ane. File Extensions and File Types
The get-go step to working with comma-separated-value (CSV) files is understanding the concept of file types and file extensions.
- Data is stored on your computer in individual "files", or containers, each with a different proper name.
- Each file contains data of dissimilar types – the internals of a Word certificate is quite different from the internals of an image.
- Computers decide how to read files using the "file extension", that is the lawmaking that follows the dot (".") in the filename.
- So, a filename is typically in the form "<random name>.<file extension>". Examples:
- project1.DOCX – a Microsoft Discussion file called Project1.
- shanes_file.TXT – a simple text file called shanes_file
- IMG_5673.JPG – An image file called IMG_5673.
- Other well known file types and extensions include: XLSX: Excel, PDF: Portable Document Format, PNG – images, ZIP – compressed file format, GIF – animation, MPEG – video, MP3 – music etc. Meet a complete list of extensions here.
- A CSV file is a file with a ".csv" file extension, e.one thousand. "information.csv", "super_information.csv". The "CSV" in this example lets the computer know that the information independent in the file is in "comma separated value" format, which we'll talk over below.
File extensions are hidden past default on a lot of operating systems. The first footstep that whatsoever self-respecting engineer, software engineer, or data scientist will practise on a new computer is to ensure that file extensions are shown in their Explorer (Windows) or Finder (Mac) windows.

To check if file extensions are showing in your system, create a new text certificate with Notepad (Windows) or TextEdit (Mac) and salve information technology to a folder of your choice. If you tin't run into the ".txt" extension in your folder when you view it, you will have to change your settings.
- In Microsoft Windows: Open Command Panel > Appearance and Personalization. Now, click on Folder Options or File Explorer Option, as information technology is now called > View tab. In this tab, under Advance Settings, yous volition see the pick Hide extensions for known file types. Uncheck this option and click on Utilise and OK.
- In Mac OS: Open Finder > In menu, click Finder > Preferences, Click Advanced, Select the checkbox for "Show all filename extensions".
2. Data Representation in CSV files
A "CSV" file, that is, a file with a "csv" filetype, is a basic text file. Any text editor such as NotePad on windows or TextEdit on Mac, tin open a CSV file and show the contents. Sublime Text is a wonderful and multi-functional text editor option for whatsoever platform.
CSV is a standard for storing tabular data in text format, where commas are used to separate the different columns, and newlines (wagon return / printing enter) used to carve up rows. Typically, the first row in a CSV file contains the names of the columns for the data.
And instance table data set and the respective CSV-format data is shown in the diagram below.

Annotation that almost any tabular information can be stored in CSV format – the format is popular because of its simplicity and flexibility. Y'all can create a text file in a text editor, save it with a .csv extension, and open that file in Excel or Google Sheets to see the table course.
Other Delimiters / Separators – TSV files
The comma separation scheme is by far the most popular method of storing tabular data in text files.
However, the choice of the ',' comma character to delimiters columns, withal, is arbitrary, and can be substituted where needed. Popular alternatives include tab ("\t") and semi-colon (";"). Tab-separate files are known as TSV (Tab-Separated Value) files.
When loading data with Pandas, the read_csv part is used for reading any delimited text file, and past irresolute the delimiter using the sep
parameter.
Delimiters in Text Fields – Quotechar
One complexity in creating CSV files is if you take commas, semicolons, or tabs really in one of the text fields that you desire to store. In this case, it's important to employ a "quote grapheme" in the CSV file to create these fields.
The quote character can exist specified in Pandas.read_csv using the quotechar
argument. By default (as with many systems), it's ready as the standard quotation marks ("). Whatever commas (or other delimiters as demonstrated below) that occur between two quote characters will be ignored as column separators.
In the instance shown, a semicolon-delimited file, with quotation marks as a quotechar is loaded into Pandas, and shown in Excel. The use of the quotechar allows the "NickName" column to contain semicolons without being dissever into more columns.
3. Python – Paths, Folders, Files
When you specify a filename to Pandas.read_csv, Python volition expect in your "current working directory". Your working directory is typically the directory that you started your Python process or Jupyter notebook from.

Finding your Python Path
Your Python path can be displayed using the congenital-in os
module. The OS module is for operating system dependent functionality into Python programs and scripts.
To find your current working directory, the function required is bone.getcwd()
. Theos.listdir()
role can be used to brandish all files in a directory, which is a practiced check to meet if the CSV file you are loading is in the directory as expected.
# Observe out your electric current working directory import os print(os.getcwd()) # Out: /Users/shane/Documents/blog # Display all of the files institute in your electric current working directory impress(os.listdir(os.getcwd()) # Out: ['test_delimted.ssv', 'CSV Blog.ipynb', 'test_data.csv']
In the instance to a higher place, my current working directory is in the '/Users/Shane/Document/blog' directory. Any files that are places in this directory will exist immediately bachelor to the Python file open() function or the Pandas read csv office.
Instead of moving the required data files to your working directory, you tin can also modify your current working directory to the directory where the files reside usingos.chdir()
.
File Loading: Accented and Relative Paths
When specifying file names to the read_csv office, yous can supply both absolute or relative file paths.
- A relative pathis the path to the file if you start from your current working directory. In relative paths, typically the file will be in a subdirectory of the working directory and the path will not start with a drive specifier, eastward.g. (information/test_file.csv). The characters '..' are used to move to a parent directory in a relative path.
- An absolute pathis the complete path from the base of your file system to the file that you lot want to load, east.g. c:/Documents/Shane/data/test_file.csv. Absolute paths will start with a drive specifier (c:/ or d:/ in Windows, or '/' in Mac or Linux)
It's recommended and preferred to employ relative paths where possible in applications, because absolute paths are unlikely to work on different computers due to different directory structures.

four. Pandas CSV File Loading Errors
The most common fault's you'll become while loading data from CSV files into Pandas volition be:
-
FileNotFoundError: File b'filename.csv' does not exist
A File Not Found fault is typically an issue with path setup, current directory, or file name confusion (file extension can play a part here!) -
UnicodeDecodeError: 'utf-8' codec can't decode byte in position : invalid continuation byte
A Unicode Decode Mistake is typically caused by not specifying the encoding of the file, and happens when you have a file with non-standard characters. For a quick ready, effort opening the file in Sublime Text, and re-saving with encoding 'UTF-8'. -
pandas.parser.CParserError: Mistake tokenizing data.
Parse Errors can be caused in unusual circumstances to do with your data format – try to add the parameter "engine='python'" to the read_csv function telephone call; this changes the data reading function internally to a slower only more than stable method.
Advanced Read CSV Files
There are some additional flexible parameters in the Pandas read_csv() office that are useful to have in your armory of information science techniques:
Specifying Data Types
Equally mentioned before, CSV files do not contain any type information for data. Information types are inferred through examination of the meridian rows of the file, which can lead to errors. To manually specify the data types for different columns, thedtype parameter tin exist used with a dictionary of column names and information types to be applied, for case:dtype={"proper noun": str, "age": np.int32}
.
Notation that for dates and engagement times, the format, columns, and other behaviour can be adapted using parse_dates, date_parser, dayfirst, keep_dateparameters.
Skipping and Picking Rows and Columns From File
Thenrows parameter specifies how many rows from the top of CSV file to read, which is useful to take a sample of a large file without loading completely. Similarly theskiprowsparameter allows y'all to specify rows to leave out, either at the offset of the file (provide an int), or throughout the file (provide a listing of row indices). Similarly, theusecolsparameter tin be used to specify which columns in the information to load.
Custom Missing Value Symbols
When data is exported to CSV from different systems, missing values can be specified with different tokens. Thena_values parameter allows you to customise the characters that are recognised as missing values. The default values interpreted as NA/NaN are: '', '#N/A', '#N/A N/A', '#NA', '-one.#IND', '-ane.#QNAN', '-NaN', '-nan', '1.#IND', 'i.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'aught'.
# Advanced CSV loading case data = pd.read_csv( "information/files/complex_data_example.tsv", # relative python path to subdirectory sep='\t' # Tab-separated value file. quotechar="'", # single quote allowed as quote character dtype={"salary": int}, # Parse the salary column as an integer usecols=['name', 'birth_date', 'salary']. # But load the three columns specified. parse_dates=['birth_date'], # Intepret the birth_date column as a date skiprows=10, # Skip the offset 10 rows of the file na_values=['.', '??'] # Take any '.' or '??' values every bit NA )
CSV Format Advantages and Disadvantages
As with all technical decisions, storing your information in CSV format has both advantages and disadvantages. Be aware of the potential pitfalls and issues that y'all volition encounter as you lot load, shop, and exchange data in CSV format:
On the plus side:
- CSV format is universal and the data can be loaded by almost any software.
- CSV files are simple to sympathize and debug with a basic text editor
- CSV files are quick to create and load into memory before analysis.
Nevertheless, the CSV format has some negative sides:
- There is no data type information stored in the text file, all typing (dates, int vs bladder, strings) are inferred from the information just.
- There'southward no formatting or layout information storable – things like fonts, borders, cavalcade width settings from Microsoft Excel will exist lost.
- File encodings can become a problem if there are non-ASCII compatible characters in text fields.
- CSV format is inefficient; numbers are stored as characters rather than binary values, which is wasteful. You will discover however that your CSV data compresses well using zip compression.
Every bit and aside, in an effort to counter some of these disadvantages, 2 prominent information scientific discipline developers in both the R and Python ecosystems, Wes McKinney and Hadley Wickham, recently introduced the Plume Format, which aims to be a fast, elementary, open, flexible and multi-platform data format that supports multiple data types natively.
Additional Reading
- Official Pandas documentation for the read_csv role.
- Python 3 Notes on file paths, working directories, and using the OS module.
- Datacamp Tutorial on loading CSV files, including some boosted OS commands.
- PythonHow Loading CSV tutorial.
Source: https://www.shanelynn.ie/python-pandas-read-csv-load-data-from-csv-files/
0 Response to "Read Csv File in Python From Desktop"
Post a Comment