WordPress and Word Processors

Where Where do inline styles come from?

When redesigning websites one of the problems I have to deal with is how to handle importing old content that has inline styles embedded.

A lot of the culprits are webpages that were originally written using a word processing program such as Microsoft Word. These programs make it easy to play around with font sizes, text colours etc. Whilst fine for printing, using these documents as web pages causes issues.

What the hell are inline styles?

Your web designer will have taken some time to style your website, choosing fonts, colours and sizes carefully. These are called CSS rules (Cascading Style Sheets). Generally these are held in a separate file or files, and control how the whole website looks.

Inline styles are the same kind of rules, but instead of living in a separate file they are embedded directly in the text of the document. The way CSS rules work means that rules found directly in the text are seen as more important than those in separate Style Sheets – and override them.

So when using a word processor, such as Word, each time you highlight some text and change fonts, colours and sizes a hidden CSS rule (an inline style) is inserted into your document.

Word processors also embed lots more ‘hidden’ code into your documents than just fonts, colours and sizes. Whilst this code is recognised in your word processor it can play havoc with websites; breaking layouts, displaying odd characters and generally making a mess of things.

Why inline styles are a problem.

Keeping the words (the content) separate from the styling (the presentation) makes for easier site wide changes. For example;

Imagine you’ve produced 10 web pages on your favourite word processor, and each page contains 10 heading which you’ve set as red. In a few months’ time you decide you no longer like red, but to change all those heading to blue you’ll have to edit all ten pages, and change each and every heading to blue, all 100 of them.

But without inline styles the styling of the headings can all be controlled using CSS rules in a Style Sheet. So making a few simple changes once in the Style Sheet will affect all 100 of the headings in one go.

How can I avoid inline styles and still use my favourite Word Processor?

This very post was written using Microsoft’s Word – but only to actually put the words down, I didn’t set any headings, fonts, font sizes or colours, just some returns to separate the paragraphs from the headings. Setting the headings, making words bold etc. is all performed on the actual website once the bare text is pasted in from Word.

Pasting Word Documents into WordPress.

WordPress has consistently improved at stripping out inline styles from Word documents – with the latest releases delivering very clean text devoid of all inline styles. However if you have difficulties there’s a handy function to remember. When editing a Post or Page, switch to the Visual mode editor_visualhtml1 (if you’re not using this already) and click on the ‘Paste as text’ button. editor_pastetext1

If you don’t see these options on the editing screen then try clicking on ‘Toolbar Toggle’ button editor_kitchensink1 for more options.

This will paste in bare text devoid of all styling. You can then use the WordPress editor to set headings, bold text etc.

What if the content is already in WordPress and has inline styles embedded.

Well the hard way is to edit each offending Post/Page, switching to the text mode editor_visualhtml1 will reveal all the hidden code embedded in the text. Then carefully removing the offending code leaving clean text.

There are a couple of tricks to try;

View the offending page in your browser, highlight and copy all the text. Then edit the Post/Page removing all the text before pasting in the text you copied from the front.

Again, visit the offending Post/Page in your browser, copy the text, and this time paste into a simple text editor (in Windows try Notepad ). This should strip out all formatting, leaving clean text ready to paste back into WordPress.

Cleaning text programmatically.

This little function, when added to either your themes functions.php file or a site specific plugin will strip out a lot of the chaff from your content before it makes it as far as the screen.

function clean_post_content($content) {
    // Remove inline styling
    $content = preg_replace('/(<[^>]+) style=".*?"/i', '$1', $content);
    // Remove font tag
    $content = preg_replace('/<font[^>]+>/', '', $content);
    // Remove empty tags
    $post_cleaners = array('<p></p>' => '', '<p> </p>' => '', '<p>&nbsp;</p>' => '', '<span></span>' => '', '<span> </span>' => '', '<span>&nbsp;</span>' => '', '<span>' => '', '</span>' => '', '<font>' => '', '</font>' => '');
    $content = strtr($content, $post_cleaners);
    return $content;
add_filter( 'the_content', 'clean_post_content' );


This doesn’t change the saved Post/Page data in any way, it’s cleaning the text up each time someone visits that page on your website and is best used as a stop-gap whilst proper cleaning of the text is taking place.
Published on in Getting Started, Coding.