import pandas as pd
import numpy as np
!pip install openpyxl
Collecting openpyxl
Downloading openpyxl-3.0.9-py2.py3-none-any.whl (242 kB)
|████████████████████████████████| 242 kB 33.6 MB/s
Collecting et-xmlfile
Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.0.9
arr = np.random.randint(low = 10, high = 100, size = (20, 10))
df = pd.DataFrame(data = arr)
df.shape
print(pd.ExcelWriter.__doc__)
Class for writing DataFrame objects into excel sheets.
Default is to use xlwt for xls, openpyxl for xlsx, odf for ods.
See DataFrame.to_excel for typical usage.
The writer should be used as a context manager. Otherwise, call `close()` to save
and close any opened file handles.
Parameters
----------
path : str or typing.BinaryIO
Path to xls or xlsx or ods file.
engine : str (optional)
Engine to use for writing. If None, defaults to
``io.excel.<extension>.writer``. NOTE: can only be passed as a keyword
argument.
.. deprecated:: 1.2.0
As the `xlwt <https://pypi.org/project/xlwt/>`__ package is no longer
maintained, the ``xlwt`` engine will be removed in a future
version of pandas.
date_format : str, default None
Format string for dates written into Excel files (e.g. 'YYYY-MM-DD').
datetime_format : str, default None
Format string for datetime objects written into Excel files.
(e.g. 'YYYY-MM-DD HH:MM:SS').
mode : {'w', 'a'}, default 'w'
File mode to use (write or append). Append does not work with fsspec URLs.
.. versionadded:: 0.24.0
storage_options : dict, optional
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc., if using a URL that will
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://".
.. versionadded:: 1.2.0
Attributes
----------
None
Methods
-------
None
Notes
-----
None of the methods and properties are considered public.
For compatibility with CSV writers, ExcelWriter serializes lists
and dicts to strings before writing.
Examples
--------
Default usage:
>>> with ExcelWriter('path_to_file.xlsx') as writer:
... df.to_excel(writer)
To write to separate sheets in a single file:
>>> with ExcelWriter('path_to_file.xlsx') as writer:
... df1.to_excel(writer, sheet_name='Sheet1')
... df2.to_excel(writer, sheet_name='Sheet2')
You can set the date format or datetime format:
>>> with ExcelWriter('path_to_file.xlsx',
... date_format='YYYY-MM-DD',
... datetime_format='YYYY-MM-DD HH:MM:SS') as writer:
... df.to_excel(writer)
You can also append to an existing Excel file:
>>> with ExcelWriter('path_to_file.xlsx', mode='a') as writer:
... df.to_excel(writer, sheet_name='Sheet3')
You can store Excel file in RAM:
>>> import io
>>> buffer = io.BytesIO()
>>> with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer)
You can pack Excel file into zip archive:
>>> import zipfile
>>> with zipfile.ZipFile('path_to_file.zip', 'w') as zf:
... with zf.open('filename.xlsx', 'w') as buffer:
... with pd.ExcelWriter(buffer) as writer:
... df.to_excel(writer)
with pd.ExcelWriter('myexcel.xlsx') as writer:
df.to_excel(writer)
print(pd.DataFrame.__doc__)
Two-dimensional, size-mutable, potentially heterogeneous tabular data.
Data structure also contains labeled axes (rows and columns).
Arithmetic operations align on both row and column labels. Can be
thought of as a dict-like container for Series objects. The primary
pandas data structure.
Parameters
----------
data : ndarray (structured or homogeneous), Iterable, dict, or DataFrame
Dict can contain Series, arrays, constants, dataclass or list-like objects. If
data is a dict, column order follows insertion-order.
.. versionchanged:: 0.25.0
If data is a list of dicts, column order follows insertion-order.
index : Index or array-like
Index to use for resulting frame. Will default to RangeIndex if
no indexing information part of input data and no index provided.
columns : Index or array-like
Column labels to use for resulting frame. Will default to
RangeIndex (0, 1, 2, ..., n) if no column labels are provided.
dtype : dtype, default None
Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
See Also
--------
DataFrame.from_records : Constructor from tuples, also record arrays.
DataFrame.from_dict : From dicts of Series, arrays, or dicts.
read_csv : Read a comma-separated values (csv) file into DataFrame.
read_table : Read general delimited file into DataFrame.
read_clipboard : Read text from clipboard into DataFrame.
Examples
--------
Constructing DataFrame from a dictionary.
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> df = pd.DataFrame(data=d)
>>> df
col1 col2
0 1 3
1 2 4
Notice that the inferred dtype is int64.
>>> df.dtypes
col1 int64
col2 int64
dtype: object
To enforce a single dtype:
>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1 int8
col2 int8
dtype: object
Constructing DataFrame from numpy ndarray:
>>> df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
... columns=['a', 'b', 'c'])
>>> df2
a b c
0 1 2 3
1 4 5 6
2 7 8 9
Constructing DataFrame from dataclass:
>>> from dataclasses import make_dataclass
>>> Point = make_dataclass("Point", [("x", int), ("y", int)])
>>> pd.DataFrame([Point(0, 0), Point(0, 3), Point(2, 3)])
x y
0 0 0
1 0 3
2 2 3
classes = np.tile(['good', 'bad', 'neutral'], 100)[:20]
classes = classes[np.random.randint(low = 0, high = 20, size = 20)]
df['Classes'] = classes
df.head()
df['Classes'], _ = pd.factorize(df['Classes'])
df.head()
df_2 = df.copy()[:15]
df.shape
df_2.shape
### Concatenation of dataframes
pd.concat(objs =[df, df_2.sort_index(ascending = False)], axis = 1)
angles = np.array([90, 180, 270, 360])
angles = np.deg2rad(angles)
vector = np.random.randint(low = 10, high = 100, size = (10))
vsum = np.sqrt(np.sum(vector**2))
vsum
vsum*vsum/np.dot(vector, vector)
np.cos(angles)
### Merge dataframes
pd.merge(left = df, right = df_2, how = 'outer')
all(['', 'dsfbujdfbvklfnbvk'])
any(df)
try_df = pd.concat(objs =[df, df_2.sort_index(ascending = False)], axis = 1)
all(try_df)
pd.isna(try_df)
try_df[pd.isna(try_df)]
try_df.fillna(value = 'Missing', inplace = True)
try_df
all(try_df)
pd.isna(try_df).sum()