Friday, February 22, 2013

Converting Word DOC or DOCX to Mobi

For as many writers that you have talked to, you have probably found the same number of unique opinions on what software to use for completing your work. I have been offered Scrivener, Writeroom, Darkroom, Open Office, Q10, classic pen an paper, Google docs etc. All of them have their advantages, but for me, they all have one huge disadvantage - unfamiliarity. I don't want to waste time trying something new. I have a perfectly good word processor on my computer that has been with me since I've been writing term papers in the 90s. Microsoft Word is not the best thing out there, but I know the ins and outs of the program better than anything else. So when it comes time to write, I don't want to have to think about where the paste special command is hiding.

If you are like me and make do with Microsoft Word, then you may have encountered issues with turning your .doc or .docx files into an HTML that works for Kindle. Hopefully, this checklist will guide you through the process.


1. Complete your project.
You don't want to proceed into converting your file until you are absolutely sure you are done. Did you have it edited? Did you re-read? Are all of you i's dotted? If not, spend some more time polishing your work. Making corrections after this process may result in some funky formatting issues.

I am going to show you the evolution of one particular excerpt and how the coding will change in each step. This is how the excerpt looks in the .docx file:
   This cannot be happening to me again!
   Seefer Elliot hugged the wall of the cold, dark basement below Harrison Middle School. A few weeks ago, he grew quite familiar with this subterranean corridor and the workshop at the end. At one point, trapped inside while would-be alien captors ran amok in the school above him. This time, however, he was on the outside of the uninviting room while a gun-toting secret agent beckoned for him.
You'll notice that my file has no added spaces between paragraphs and the first line is indented .25" inches. You may not want this particular formatting style for your book, but this process can be substituted for whatever your preference is.

2. Set up your HTML seeds.
In Microsoft Word, open you finished .doc or .docx file. Choose "Save as > Other Formats" from the Office dropdown menu. Choose "Webpage, Filtered" from the "save as type" box. Let's call this file garbage.html.

Now we need a seed file that will get us going on the right foot. Amazon likes it simple, so we'll make it that way. I posted my template for an easy-to-use seed file here. Copy and paste the code on this page to a new text document. I recommend using Dreamweaver or a similar webpage editor if you have it. Otherwise, a simple editor like Wordpad or Simpletext will do just fine. Save it as seed.html and keep it for future use.

3. Copy what you need.
Now open garbage.html in the same text editor.

You will see a lot of junk at that you may not be used to seeing. Most of it is the styling codes needed to make your page look exactly like it did in Word. But we don't want any of that. Scroll all the way down and find the first sentence of your first chapter. Drag and select all of the text of that first chapter. Mine looks like this:


<p class=MsoNormal style='margin-bottom:1.0pt;text-indent:.25in'><i style='mso-bidi-font-style:normal'><span style='font-size:12.0pt;mso-bidi-font-size:11.0pt;line-height:115%;font-family:"Times New Roman";color:black'>This cannot be happening to me again!<o:p></o:p></span></i></p>
<p class=MsoNormal style='margin-bottom:1.0pt;text-indent:.25in'><span style='font-size:12.0pt;mso-bidi-font-size:11.0pt;line-height:115%;font-family: "Times New Roman";color:black'>Seefer Elliot hugged the wall of the cold, dark basement below Harrison Middle School. A few weeks ago, he grew quite familiar with this subterranean corridor and the workshop at the end. At one point, trapped inside while would-be alien captors ran amok in the school above him. This time, however, he was on the outside of the uninviting room while a gun-toting secret agent beckoned for him.<o:p></o:p></span></p>

Copy and move onto the next step.

4. Paste into new template.
You now have a selection from your garbage.html file saved to your clipboard. Go over to seed.html and find "[CHAPTER 1 TEXT]". Drag select this line, then paste your saved text from the clipboard.

Repeat steps 3 and 4 until you transferred all of your chapters into seed.html.

5. Correcting the code.
Kindle won't make any sense of "MsoNormal", "mso-bidi-font-style", or "o:p" and all the other junk that Word puts in their HTML coding. We need to simplify.

In my example below, I am going to use the exact lines that I have in my excerpt from about. Yours may be different depending on how you formatted your book. Keep a keen eye out for the differences when searching for the text. The code you put in its place will be the same no matter what.

(A) Get rid of Word backwards compatibility. We are making this document for Kindle. It won't be seen in Word ever again. Get rid of the tags that Word injects for its own purposes. In my case, I had a 1.15 break between each paragraph. The [o:p] tags are there to reinsert that break if I open in Word.
Find:
<o:p></o:p>
Replace with: (nothing). Just leave the replace with field blank or add a space if need be.

(B) Correct the beginning of each paragraph.
Find:
<p class=MsoNormal style='margin-bottom:1.0pt;text-indent:.25in'>
 Replace with:
<p style="margin:0.00% 0.00% 0.21%; text-indent:1.5em; line-height:115%; widows:0; orphans:0; ">
(C) Redefine italics.
Find:
<i style='mso-bidi-font-style:normal'><span style='font-size:12.0pt;mso-bidi-font-size:11.0pt;line-height:115%;font-family:"Times New Roman";color:black'>
Replace with:
<span style=" font-size:1.0rem; font-style:italic"> 
Then find:
<i style='mso-bidi-font-style:normal'>
 and replace with:
<span style=" font-size:1.0rem; font-style:italic">
Finally, find:
</i> 
And replace with:
</span>

(D) Redefine regular font style.
Find:
 <span style='font-size:12.0pt;mso-bidi-font-size:11.0pt;line-height:115%;font-family: "Times New Roman";color:black'>
Replace with:
<span style=" font-size:1.0rem">
(E) Clean up any other styles. I don't use boldface, underline or any other styles in my text, so I don't have the need to clean it up. If you do, simply search for the applicable text and swap out the Word coding with a [span] styling. Font-style can be bold, underline, strike-through, etc.

The final product should look like this:

<p style="margin:0.00% 0.00% 0.21%; text-indent:1.5em; line-height:115%; widows:0; orphans:0; "><span style=" font-size:1.0rem; font-style:italic">This cannot be happening to me again!</span></span></p>
<p style="margin:0.00% 0.00% 0.21%; text-indent:1.5em; line-height:115%; widows:0; orphans:0; "><span style=" font-size:1.0rem">Seefer Elliot hugged the wall of the cold, dark basement below Harrison Middle School. A few weeks ago, he grew quite familiar with this subterranean corridor and the workshop at the end. At one point, trapped inside while would-be alien captors ran amok in the school above him. This time, however, he was on the outside of the uninviting room while a gun-toting secret agent beckoned for him.</span></p>

 6. Save and Review.
Once you have finished finding and replacing all of the bad Word styling, save the file to whatever you want. Let's call it book.html. Open book.html in your browser and make sure everything looks good. If it does, then you are ready to bundle it up and publish!

No comments:

Post a Comment