pymicra.algs package



pymicra.algs.auxiliar.applyResult(result, failed, df, control=None, testname=None, filename=None, failshow=False, index_n=None)

Auxiliar function to be used with util.qcontrol

  • result (bool) – whether the test failed and succeeded
  • failed (list) – list of failed variables. None object if the test was successful
  • control (dictionary) – dictionary whose keys are the names of the tests and items are lists
  • testname (string) – name of the test (has to match control dict)
  • filename (string) – name or path or identifier of the file tested
  • failshow (bool) – whether to show the failed variables or not

Returns first and last lines of a file


Calculates the length of a year in days Useful to figure out if a certain year is a leap year

pymicra.algs.auxiliar.stripDown(str, final='', args=['_', '-'])

Auxiliar function to strip down keywords from symbols

pymicra.algs.auxiliar.testValid(df_valid, testname='', failverbose=True, passverbose=True, filepath=None)

Tests a boolean DataFrane obtained from the test and prints standard output

  • df_valid (pandas.Series) – series contaning only True or False values for each of the variables, which should be the indexes
  • testname (string) – the name of the test that generated the True/False values
  • failverbose (bool) – whether to return which variables caused a false result
  • passverbose (bool) – whether to print something successful cases

  • result (bool) – True if the run passed the passed
  • failed (list) – list of failed variables if result==False. None otherwise.


pymicra.algs.general.classbin(x, y, bins_number=100, function=<function mean>, xfunction=<function mean>, logscale=True)

Separates x and y inputs into bins based on the x array. x and y do not have to be ordered.

  • x (np.array) – independent variable
  • y (np.array) – dependent variable
  • bins_number (int) – number of classes (or bins) desired
  • function (callable) – funtion to be applied to both x and y-bins in order to smooth the data
  • logscale (boolean) – whether or not to use a log-spaced scale to set the bins

  • np.array – x binned
  • np.array – y binned

pymicra.algs.general.diff_central(x, y)

Applies the central finite difference scheme

  • x (array) – independent variable
  • y (array) – dependent variable

dydx – the dependent variable differentiated

Return type:



Returns length of a file through piping bash’s function wc

Parameters:fname (string) – path of the file
pymicra.algs.general.find_nearest(array, value)

Smart and small function to find the index of the nearest value, in an array, of some other value

  • array (array) – list or array
  • value (float) – value to look for in the array
pymicra.algs.general.fitByDate(data, degree=1, rule=None)

Given a pandas DataFrame with the index as datetime, this routine fit a n-degree polynomial to the dataset

  • data (pd.DataFrame, pd.Series) – dataframe whose columns have to be fitted
  • degree (int) – degree of the polynomial. Default is 1.
  • rule (str) – pandas offside string. Ex.: “10min”.
pymicra.algs.general.fitWrap(x, y, degree=1)

A wrapper to numpy.polyfit and numpy.polyval that fits data given an x and y arrays. This is specifically designed to be used with by pandas.DataFrame.apply method

  • x (array, list) –
  • y (array, list) –
  • degree (int) –
pymicra.algs.general.get_index(x, to_look_for)

Just like the .index method of lists, except it works for multiple values

  • x (list or array) – the main array
  • to_look_for (list or array) – the subset of the main whose indexes are desired

indexes – array with the indexes of each element in y

Return type:



Auxiliar function ro retrieve notation

pymicra.algs.general.inverse_normal_cdf(mu, sigma)

Applied the inverse normal cumulative distribution

mu: mean sigma: standard deviation

pymicra.algs.general.latexify(variables, math_mode=True)
pymicra.algs.general.limitedSubs(data, max_interp=3, func=<function <lambda>>)

Substitute elements for NaNs if a certain conditions given by fund is met at a maximum of max_interp times in a row. If there are more than that number in a row, then they are not substituted.

  • data (pandas.dataframe) – data to be interpolated
  • max_interp (int) – number of maximum NaNs in a row to interpolate
  • func (function) – function of x only that determines the which elements become NaNs. Should return only True or False.

df – dataframe with the elements substituted

Return type:


pymicra.algs.general.limited_interpolation(data, maxcount=3)

Interpolates linearly but only if gap is smaller of equal to maxcout

  • data (pandas.DataFrame) – dataset to interpolate
  • maxcount (int) – maximum number of consecutive NaNs to interpolate. If the number is smaller than that, nothing is done with the points.
pymicra.algs.general.line2date(line, dlconfig)

Gets a date from a line of file according to dataloggerConfig object.

  • line (string) – line of file with date inside
  • dlconfig (pymicra.dataloggerConfig) – configuration of the datalogger


Return type:

datetime object

pymicra.algs.general.mad(data, axis=None)
pymicra.algs.general.name2date(filename, dlconfig)

Gets a date from a the name of the file according to a datalogger config object

  • filename (string) – the (base) name of the file
  • dlconfig (pymicra.dataloggerConfig) – configuration of the datalogger

  • cdate (datetime object)
  • Warning: Needs to be optimized in order to read question markers also after the date

pymicra.algs.general.parseDates(data, dataloggerConfig=None, date_col_names=None, clean=True, verbose=False, connector='')

Author: Tomas Chor date: 2015-08-10 This routine parses the date from a pandas DataFrame when it is divided into several columns

  • data (pandas DataFrame) – dataFrame whose dates have to be parsed
  • date_col_names (list) – A list of the names of the columns in which the date is divided the naming of the date columns must be in accordance with the datetime directives, so if the first column is only the year, its name must be %Y and so forth. see
  • connector (string) – should be used only when the default connector causes some conflit
  • first_time_skip (int) – the offset (mostly because of the bad converting done by LBA
  • clean (bool) – remove date columns from data after it is introduced as index

data indexed by timestamp

Return type:


pymicra.algs.general.resample(df, rule, how=None, **kwargs)

Extends pandas resample methods to index made of integers

pymicra.algs.general.splitData(data, rule='30min', return_index=False, **kwargs)

Splits a given pandas DataFrame into a series of “rule”-spaced DataFrames

  • data (pandas dataframe) – data to be split
  • rule (str or int) –
    If it is a string, it should be a pandas string offset.
    Some possible values (that should be followed by an integer) are: D calendar day frequency W weekly frequency M month end frequency MS month start frequency Q quarter end frequency BQ business quarter endfrequency QS quarter start frequency A year end frequency AS year start frequency H hourly frequency T minutely frequency Min minutely frequency S secondly frequency L milliseconds U microseconds

    If it is a int, it should be the number of lines desired in each separated piece.

    If it is None, then the dataframe isn’t separated and a list containing only the full dataframe is returned.

    check it complete at



pymicra.algs.units.add(elems, units, inplace_units=False, unitdict=None, key=None)

Add elements considering their units

pymicra.algs.units.convert_cols(data, guide, units, inplace_units=False)

Converts data from one unit to the other

  • data (pandas.DataFrame) – to be chanhed from one unit to the other
  • guide (dict) – {names of columns : units to converted to}
  • units (dict) – units dictionary
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
pymicra.algs.units.convert_indexes(data, guide, units, inplace_units=False)

Converts data from one unit to the other

  • data (pandas.Series) – to be chanhed from one unit to the other
  • guide (dict) – {names of columns : units to converted to}
  • units (dict) – units dictionary
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
pymicra.algs.units.convert_to(data, inunit, outunit, inplace_units=False, key=None)

Converts data from one unit to the other

  • data (pandas.series) – to be chanhed from one unit to the other
  • inunit (pint.quantity or dict) – unit(s) that the data is in
  • outunit (str) – convert to this unit
  • inplace_units (bool) – if inunit is a dict, the dict is update in place. “key” keyword must be provided
  • key (str) – if inunit is a dict, it is the name of the variable to be changed
pymicra.algs.units.divide(elems, units, inplace_units=False, unitdict=None, key=None)

Divide elements considering their units

pymicra.algs.units.multiply(elems, units, inplace_units=False, unitdict=None, key=None)

Multiply elements considering their units

pymicra.algs.units.operate(elems, units, inplace_units=False, unitdict=None, key=None, operation='+')

Operate on elements considering their units

  • elems (list, tuple) – list of pandas.Series
  • units (list, tuple) – list of pint.units ordered as the elems list
  • inplace_units (bool) – sets dictionary inplace_units
  • unitdict (dict) – dict to be set inplace
  • key (str) – name of variables to be set inplace as dict key

Gets unit from string, list of strings, or dict’s values, using the UnitRegistry defined in

pymicra.algs.units.with_units(data, units)

Wrapper around toUnitsCsv to create a method to print the contents of a dataframe plus its units into a unitsCsv file.

  • self (pandas.DataFrame, pandas.Series) – dataframe or series to which units belong
  • units (dict) – dictionary with the names of each column and their unit

Module contents