import an xml into phpmyadmin

Given that WordPress doesn’t allow to import a local file (and this is quite unfair, in my opinion), you can transform a wp site into a xml file and then import it in your local database via phpmyadmin.

But you have to format carefully the xml: look how phpmyadmin export an xml file and format your xml according to that model.

regex replace html tags

In exporting a odt file to epub LibreOffice can make many mistakes, such as get a 2nd level title not with <h2>, but with <p class=”para0″>. To fix this error, you can use regex, in this way:

find: <p class="para0">(.*?)</p>
replace: <h2>\1</h2>

and so on for similar cases.

regex “whatever”

If you want to select “Whatever” (word or character), regardless of its length, you can simply use

(.*?)

For example if you want delete all the words between <span> and </span>, as in the following row

many words <span>many other words here</span> other words

you can use

delete <span>(.*?)</span>. 

The result will be:

many words other words

automatic crop pdf margins

I found this excellent tool: PdfCropMargins, a very light app (both for Linux and Windows), which can crop automatically pdf white margins (top,bottom,left,right) even if they are very irregular (different from page to page). With a great accuracy. And without growing the pdf size.

You can use the gui, starting from a command line, such as pdf-crop-margins -gui “original-pdf.pdf” -o “target-pdf.pdf”.

save only some pdf pages from a pdf

You can use Okular, of course: print to file -> the pages you want. But thus you get an image pdf. If you want keep a text pdf, you can use a program like PdfArranger, which allows to save (only) the pages you want from a whole pdf, keeping them as searchable pdf (text).

quick type special characters

In Linux you can use ComposeKey, setting it for example (in System settings) as RightCtrl (the right-Ctrl key). RightCtrl is better than AltGr in Italian keyboard, to keep AltGr for some characters like ‘[‘, or ‘]’, or ‘@’, or ‘#’, otherwise unaccessible.

In that way, when you type 1) first RightCtrl 2) then ^ 3) then o, you will get ô. You don’t need to press simultaneously all the keys.

To sum up, the main simbols :

  • RightCtrl+^+o = ô
  • RightCtrl+”+o = ö
  • RightCtrl+’+o = ó

re-ocr a pdf with Adobe

You have to 1) save the old searchable pdf to tiff images (as many as the pages), 2) ocr the tiff images to a searchable pf 3) combine the new multiple pdf to one pdf.

problems with phpmyadim

Sometimes it happens that phpmyadmin (/mysql) don’t allow you to do what it should allow, such as change the encoding of a column (or of a table or of a database), or change the engine of tables.

Then, after many failed attempts via sql query, I found that the easiest solution is

  • export the database
  • do the changes you want through a text editor, such as Kate, i.g., replacing the old enconding with the new one
  • import the (modifyed) database (after deleting/renaming the old one)
  • done!