dateparser package

Submodules

dateparser.conf module

dateparser‘s parsing behavior can be configured like below

``PREFER_DAY_OF_MONTH`` defaults to current and can have first and last as values:

>>> from dateparser.conf import settings
>>> from dateparser import parse
>>> parse(u'December 2015')
datetime.datetime(2015, 12, 16, 0, 0)
>>> settings.update('PREFER_DAY_OF_MONTH', 'last')
>>> parse(u'December 2015')
datetime.datetime(2015, 12, 31, 0, 0)
>>> settings.update('PREFER_DAY_OF_MONTH', 'first')
>>> parse(u'December 2015')
datetime.datetime(2015, 12, 1, 0, 0)

``PREFER_DATES_FROM`` defaults to current_period and can have past and future as values. Assuming current date is June 16, 2015:

>>> from dateparser.conf import settings
>>> from dateparser import parse
>>> parse(u'March')
datetime.datetime(2015, 3, 16, 0, 0)
>>> settings.update('PREFER_DATES_FROM', 'future')
>>> parse(u'March')
datetime.datetime(2016, 3, 16, 0, 0)

``SKIP_TOKENS`` is a list of tokens to discard while detecting language. Defaults to ['t'] which skips T in iso format datetime string.e.g. 2015-05-02T10:20:19+0000. This only works with DateDataParser like below:

>>> settings.update('SKIP_TOKENS', ['de'])  # Turkish word for 'at'
>>> from dateparser.date import DateDataParser
>>> DateDataParser().get_date_data(u'27 Haziran 1981 de')  # Turkish (at 27 June 1981)
{'date_obj': datetime.datetime(1981, 6, 27, 0, 0), 'period': 'day'}
class dateparser.conf.Settings(**kwargs)[source]

Bases: object

PREFER_DATES_FROM = 'current_period'
PREFER_DAY_OF_MONTH = 'current'
SKIP_TOKENS = ['t']
SUPPORT_BEFORE_COMMON_ERA = False
update(key, value)[source]
dateparser.conf.reload_settings()[source]

dateparser.date module

class dateparser.date.DateDataParser(languages=None, allow_redetect_language=False)[source]

Bases: object

Class which handles language detection, translation and subsequent generic parsing of string representing date and/or time.

Parameters:
  • languages (list) – A list of two letters language codes, e.g. [‘en’, ‘es’]. If languages are given, it will not attempt to detect the language.
  • allow_redetect_language (bool) – Enables/disables language re-detection.
Returns:

A parser instance

Raises:

ValueError - Unknown Language, TypeError - Languages argument must be a list

get_date_data(date_string, date_formats=None)[source]

Parse string representing date and/or time in recognizable localized formats. Supports parsing multiple languages and timezones.

Parameters:
  • date_string (str|unicode) – A string representing date and/or time in a recognizably valid format.
  • date_formats (list) – A list of format strings using directives as given here. The parser applies formats one by one, taking into account the detected languages.
Returns:

a dict mapping keys to datetime.datetime object and period. For example: {‘date_obj’: datetime.datetime(2015, 6, 1, 0, 0), ‘period’: u’day’}

Raises:

ValueError - Unknown Language

Note

Period values can be a ‘day’ (default), ‘week’, ‘month’, ‘year’.

Period represents the granularity of date parsed from the given string.

In the example below, since no day information is present, the day is assumed to be current day 16 from current date (which is June 16, 2015, at the moment of writing this). Hence, the level of precision is month.

>>> DateDataParser().get_date_data(u'March 2015')
{'date_obj': datetime.datetime(2015, 3, 16, 0, 0), 'period': u'month'}

Similarly, for date strings with no day and month information present, level of precision is year and day 16 and month 6 are from current_date.

>>> DateDataParser().get_date_data(u'2014')
{'date_obj': datetime.datetime(2014, 6, 16, 0, 0), 'period': u'year'}
Dates with time zone indications or UTC offsets are returned in UTC time.
>>> DateDataParser().get_date_data(u'23 March 2000, 1:21 PM CET')
{'date_obj': datetime.datetime(2000, 3, 23, 14, 21), 'period': 'day'}
language_loader = <dateparser.languages.loader.LanguageDataLoader object>
dateparser.date.date_range(begin, end, **kwargs)[source]
dateparser.date.get_date_from_timestamp(date_string)[source]
dateparser.date.get_intersecting_periods(low, high, period='day')[source]
dateparser.date.get_last_day_of_month(year, month)[source]
dateparser.date.parse_with_formats(date_string, date_formats)[source]

Parse with formats and return a dictionary with ‘period’ and ‘obj_date’.

Returns:datetime.datetime, dict or None
dateparser.date.sanitize_date(date_string)[source]
dateparser.date.sanitize_spaces(html_string)[source]

dateparser.date_parser module

class dateparser.date_parser.DateParser[source]

Bases: object

parse(date_string)[source]
dateparser.date_parser.dateutil_parse(date_string, **kwargs)[source]

Wrapper function around dateutil.parser.parse

class dateparser.date_parser.new_parser(info=None)[source]

Bases: dateutil.parser.parser

Implements an alternate parse method which supports preference to dates in future and past. For more see issue #36

static get_period(res)[source]
static get_valid_day(res, new_date)[source]
parse(timestr, default=None, ignoretz=False, **kwargs)[source]

dateparser.freshness_date_parser module

class dateparser.freshness_date_parser.FreshnessDateDataParser(now=None)[source]

Bases: object

Parses date string like “1 year, 2 months ago” and “3 hours, 50 minutes ago”

get_date_data(date_string)[source]
get_kwargs(date_string)[source]
parse(date_string)[source]

dateparser.timezone_parser module

dateparser.timezone_parser.convert_to_local_tz(datetime_obj, datetime_tz_offset)[source]
dateparser.timezone_parser.get_local_tz_offset()[source]
dateparser.timezone_parser.get_tz_offsets()[source]
dateparser.timezone_parser.pop_tz_offset_from_string(date_string, as_offset=True)[source]

dateparser.timezones module

dateparser.utils module

dateparser.utils.get_logger()[source]
dateparser.utils.increase_regex_replacements_group_positions(replacement, increment)[source]
dateparser.utils.setup_logging()[source]
dateparser.utils.wrap_replacement_for_regex(replacement, regex)[source]

Module contents

dateparser.parse(date_string, date_formats=None, languages=None)[source]

Parse date and time from given date string.

Parameters:
  • date_string (str|unicode) – A string representing date and/or time in a recognizably valid format.
  • date_formats (list) –

    A list of format strings using directives as given here. The parser applies formats one by one, taking into account the detected languages.

  • languages (list) – A list of two letters language codes.e.g. [‘en’, ‘es’]. If languages are given, it will not attempt to detect the language.
Returns:

Returns a datetime.datetime if successful, else returns None

Raises:

ValueError - Unknown Language