Section Extraction

Extract specific sections (ITEM 1, ITEM 1A, etc.) from 10-K, 10-Q, 8-K, and 20-F filings.

Why Extract Sections?

Instead of processing entire 200-page filings, extract just the sections you need: - Risk Factors (ITEM 1A) - Business Description (ITEM 1) - MD&A (ITEM 7) - Financial Statements (ITEM 8)

Basic Usage

import sec2md
from sec2md import Item10K

pages = sec2md.parse_filing(filing_html)

sections = sec2md.extract_sections(pages, filing_type="10-K")

risk_factors = sec2md.get_section(sections, Item10K.RISK_FACTORS)
business = sec2md.get_section(sections, Item10K.BUSINESS)

print(risk_factors.markdown())
print(f"Pages: {risk_factors.page_range}")  # (12, 25)

Available Section Enums

10-K Sections (`Item10K`)

from sec2md import Item10K

# Part I
Item10K.BUSINESS                    # ITEM 1
Item10K.RISK_FACTORS                # ITEM 1A
Item10K.UNRESOLVED_STAFF_COMMENTS   # ITEM 1B
Item10K.CYBERSECURITY               # ITEM 1C
Item10K.PROPERTIES                  # ITEM 2
Item10K.LEGAL_PROCEEDINGS           # ITEM 3
Item10K.MINE_SAFETY                 # ITEM 4

# Part II
Item10K.MARKET_FOR_STOCK            # ITEM 5
Item10K.SELECTED_FINANCIAL_DATA     # ITEM 6
Item10K.MD_AND_A                    # ITEM 7
Item10K.MARKET_RISK                 # ITEM 7A
Item10K.FINANCIAL_STATEMENTS        # ITEM 8
Item10K.CHANGES_IN_ACCOUNTING       # ITEM 9
Item10K.CONTROLS_AND_PROCEDURES     # ITEM 9A
Item10K.OTHER_INFORMATION           # ITEM 9B
Item10K.CYBERSECURITY_DISCLOSURES   # ITEM 9C

# Part III
Item10K.DIRECTORS_AND_OFFICERS      # ITEM 10
Item10K.EXECUTIVE_COMPENSATION      # ITEM 11
Item10K.SECURITY_OWNERSHIP          # ITEM 12
Item10K.CERTAIN_RELATIONSHIPS       # ITEM 13
Item10K.PRINCIPAL_ACCOUNTANT        # ITEM 14

# Part IV
Item10K.EXHIBITS                    # ITEM 15
Item10K.FORM_10K_SUMMARY            # ITEM 16

10-Q Sections (`Item10Q`)

from sec2md import Item10Q

# Part I
Item10Q.FINANCIAL_STATEMENTS_P1         # ITEM 1 (Part I)
Item10Q.MD_AND_A_P1                     # ITEM 2 (Part I)
Item10Q.MARKET_RISK_P1                  # ITEM 3 (Part I)
Item10Q.CONTROLS_AND_PROCEDURES_P1      # ITEM 4 (Part I)

# Part II
Item10Q.LEGAL_PROCEEDINGS_P2            # ITEM 1 (Part II)
Item10Q.RISK_FACTORS_P2                 # ITEM 1A (Part II)
Item10Q.UNREGISTERED_SALES_P2           # ITEM 2 (Part II)
Item10Q.DEFAULTS_P2                     # ITEM 3 (Part II)
Item10Q.OTHER_INFORMATION_P2            # ITEM 5 (Part II)
Item10Q.EXHIBITS_P2                     # ITEM 6 (Part II)

Note: 10-Q has duplicate item numbers across parts, so enums include _P1 / _P2 suffixes.

8-K Items (`Item8K`)

from sec2md import Item8K

Item8K.MATERIAL_AGREEMENT           # 1.01
Item8K.TERMINATION_OF_AGREEMENT     # 1.02
Item8K.BANKRUPTCY                   # 1.03
Item8K.MINE_SAFETY                  # 1.04
Item8K.CYBERSECURITY_INCIDENT       # 1.05
Item8K.ACQUISITION_DISPOSITION      # 2.01
Item8K.RESULTS_OF_OPERATIONS        # 2.02
Item8K.DIRECT_OBLIGATION            # 2.03
Item8K.DELISTING                    # 3.01
Item8K.DIRECTOR_OFFICER_CHANGE      # 5.02
Item8K.AMENDMENTS_TO_ARTICLES       # 5.03
Item8K.REGULATION_FD                # 7.01
Item8K.OTHER_EVENTS                 # 8.01
Item8K.FINANCIAL_STATEMENTS_EXHIBITS  # 9.01
# ... and 27 more (41 total)

8-K sections include automatic exhibit parsing from Item 9.01:

sections = sec2md.extract_sections(pages, filing_type="8-K")

for section in sections:
    print(f"{section.item} — {section.item_title}")
    for ex in section.exhibits:
        print(f"  Exhibit {ex.exhibit_no}: {ex.description}")

Filing Type Support

Filing Type	Enum	Items
`"10-K"`	`Item10K`	18 items
`"10-Q"`	`Item10Q`	11 items
`"8-K"`	`Item8K`	41 items
`"20-F"`	—	Items 1–19, 16A–16I

Section Object

Each Section contains:

section = sec2md.get_section(sections, Item10K.RISK_FACTORS)

section.part          # "PART I"
section.item          # "ITEM 1A"
section.item_title    # "Risk Factors"
section.pages         # List[Page] objects
section.page_range    # (start, end) tuple
section.tokens        # Total token count
section.markdown()    # Content as markdown string
section.exhibits      # List[Exhibit] (8-K Item 9.01)

Working with Sections

Iterate All Sections

sections = sec2md.extract_sections(pages, filing_type="10-K")

for section in sections:
    print(f"{section.item}: {section.item_title}")
    print(f"  Pages {section.page_range[0]}-{section.page_range[1]}")
    print(f"  Tokens: {section.tokens}")

Extract Multiple Sections

sections_to_extract = [
    Item10K.BUSINESS,
    Item10K.RISK_FACTORS,
    Item10K.MD_AND_A
]

extracted = {}
for item in sections_to_extract:
    section = sec2md.get_section(sections, item)
    if section:
        extracted[item.name] = section.markdown()

Save Sections to Files

sections = sec2md.extract_sections(pages, filing_type="10-K")

for section in sections:
    if section.item:
        filename = f"{section.item.replace(' ', '_')}.md"
        with open(filename, 'w') as f:
            f.write(section.markdown())

Next Steps

Chunk sections - Split sections into embeddings
API Reference - Full parameter docs