More dangers from automatically formatted text

I just wanted to add a little bit of extra commentary on Gary’s blog entry about pre-formatted text, especially when it’s done by copy-paste from MS Word or similar document editing software.

On top of the facts that it messes with the characters that are presented, it also builds horrible HTML. I will show you a brief example.

Here’s the text I will use. A centered line with bolded text, a blank line, and some regular text:

This is a test!

Let’s see how bad the HTML turns out.

The following is copy-pasted from OpenOffice.org’s editor directly, though shown in source code view (note: this was displaying very badly so I added some hard returns to make it display within the page):

<!-- 		@page { margin: 0.79in }
P { margin-bottom: 0.08in } 	-->
<p style="margin-bottom: 0in;" align="center">
<strong>This is a test!</strong></p>
<p style="margin-bottom: 0in;" align="center"></p>
<p style="margin-bottom: 0in; font-weight: normal;" align="left">
Let's see how bad the HTML turns out.</p>

This is the same markup, done in an efficient way:

<p align=”center”><strong>This is a test!</strong></p>

<p></p>

<p>Let’s see how bad the HTML turns out.</p>

You can see the two code-sets are similar, but the pasted copy from OpenOffice.org had a lot of extra stuff in it that was not needed, which adds to size of your page, the amount of junk that has to be processed, and while we don’t know if search engines ‘grade’ us on how clean our code is, the pasted code from OpenOffice was not clean, and why take the chance?

As far as how to avoid this, my best suggestion is to open a copy of Notepad (or similar) and paste the text into there first. You will have to re-add all of the formatting details, but you will end up with much cleaner code, which will make your website less bulky, can lower bandwidth utilization, and could reasonably make your site easier to crawl.

  • Facebook
  • Digg
  • Twitter
  • StumbleUpon
  • Reddit
  • Share/Bookmark

Tags: ,

Leave a Reply