Hello if anyone knows of a way to get python-markdown to behave in the way I’d like, or of an alternative way to do it, I’d love some help! My use case is I’m converting .md files made with Obsidian into html files. Obsidian has tags that are a pound sign followed by the tag (so like “#TagName”). When the tag is the first item on a line the pound sign is confused for a heading, even though there is no space after it.

Is there a way that I can avoid this so it only reads it as a heading if there is a space between the pound and the next word? I’m even considering some kind of find/replace logic so I can swap it out with like a link to a page that lists all the pages with that tag or something that gets run before the markdown to html conversion.

Edit: The solution I’m going for is a regex find/replace. Currently the string pattern looks like "#[^\s#][^\s" + string.punctuation + "#]*" which can find tags but ignores headers. Since the ultimate goal is to have the tags link to a tag page anyway I can solve it all in one step by doing a replace with a relevant link.

  • Ms. ArmoredThirteen@lemmy.zipOP
    link
    fedilink
    English
    arrow-up
    2
    ·
    8 days ago

    I found a regex checker and it helped so much thank you for the suggestion! I think I better understand what’s going on and was able to use that to modify it to work closer to how I want. Currently I have "#[^\s#][^\s" + string.punctuation + "#]*"

    So what I think is going on is it looks for # followed by not whitespace or another # (before it was matching on headers with multiple pound signs). Then keep looking until it runs into a whitespace, punctuation, another # (in the case of multiple tags) for as many characters as needed.

    My use case is to be able to turn the tags into links to pages with a list of pages including that tag. What I do is blindly replace the tag to where a page should exist, log the tag, and later gather up all the found tags to make the pages with lists. The punctuation was because I had some tags in weird places like the end of sentences that was adding a period or comma to it and making a unique tag (like #Homeworld at the top of a file vs. Find better tag than #Homeworld. as a note)

    • jbrains
      link
      fedilink
      arrow-up
      1
      ·
      7 days ago

      Excellent! Indeed, I’d completely forgot about H2, H3, and so on, so I’m glad you found it comfortable to figure that out!

      I read Mastering Regular Expressions about 25 years ago and it’s one of the best and simplest investments I ever made in my own programming practice. Regex never goes out of style.

      Enjoy!